Lecture Notes in 
Computer Science 1681 



David A. Forsyth Joseph L. Mundy 
Vito di Gesu Roberto Cipolla (Eds.) 



Shape, Contour 
and Grouping 
in Computer Vision 




Lecture Notes in Computer Science 1681 

Edited by G. Goos, J. Hartmanis and J. van Leeuwen 




Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




David A. Forsyth Joseph L. Mundy 
Vito di Gesu Roberto Cipolla (Eds.) 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 



Volume Editors 
David A. Forsyth 

University of California at Berkeley, Computer Science Division 
Berkeley, CA 94720, USA 
E-mail: daf@cs.berkeley.edu 

Joseph L. Mimdy 

G.E. Corporate Research and Development 
1 Research Circle, Niskayuna, NY 12309, USA 
E-mail: mundy@crd.ge.com 

Vito di Gesii 

Palermo University, C.I.T.C. 

Palermo, Sicily, Italy 

E-mail: digesu@dipmat.math.unipa.it 

Roberto Cipolla 

University of Cambridge, Department of Engineering 
Cambridge CB2 IPZ, UK 
E-mail: cipolla@eng.cam.ac.uk 



Cataloging-in-Publication data applied for 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Shape, contour and grouping in computer vision / David A. 
Forsyth . . . (ed.). - Berlin ; Fleidelberg ; New York ; Barcelona ; Hong 
Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 
(Lecture notes in computer science ; Vol. 1681) 

ISBN 3-540-66722-9 

CR Subject Classification (1998): 1.4, 1.3, 1.5, 1.2.10 
ISSN 0302-9743 

ISBN 3-540-66722-9 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1 965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

© Springer- Verlag Berlin Heidelberg 1999 
Printed in Germany 

Typesetting: Camera-ready by author 

SPIN 10704339 06/3142 —5 43 2 1 0 Printed on acid-free paper 




Preface 



Computer vision has been successful in several important applications recently. 
Vision techniques can now be used to build very good models of buildings from 
pictures quickly and easily, to overlay operation planning data on a neurosur- 
geon’s view of a patient, and to recognise some of the gestures a user makes to 
a computer. Object recognition remains a very difficult problem, however. The 
key questions to understand in recognition seem to be: (1) how objects should 
be represented and (2) how to manage the line of reasoning that stretches from 
image data to object identity. 

An important part of the process of recognition - perhaps, almost all of it 

- involves assembling bits of image information into helpful groups. There is 
a wide variety of possible criteria by which these groups could be established 

- a set of edge points that has a symmetry could be one useful group; others 
might be a collection of pixels shaded in a particular way, or a set of pixels 
with coherent colour or texture. Discussing this process of grouping requires a 
detailed understanding of the relationship between what is seen in the image 
and what is actually out there in the world. 

The international workshop on shape, contour and grouping in computer 
vision collected a set of invited participants from the US and the EC to exchange 
research ideas. This volume consists of expanded versions of papers delivered at 
that workshop. The volume contains an extensive introduction consisting of three 
articles: one sketches out common cause on what is understood in recognition, 
and the other two indicate different possible agendas for future research in the 
area. The editors have encouraged authors to produce papers with a strong 
survey aspect, and this volume contains broad surveys of shape representation, 
model selection, and shading models as well as discursive papers on learning in 
character recognition and probabilistic methods in grouping. These papers are 
accompanied by more focussed research papers that give a good picture of the 
current state of the art in research on shape, contour and grouping in computer 
vision. 

It remains for the editors to thank the participants, the sponsoring institu- 
tions (which are listed separately), the management and staff of the Hotel Torre 
Artale, and Professor and Mrs. di Gesu who, with help from Roberto Cipolla, 
handled local arrangements admirably. 



August 1999 



David Forsyth 
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Introduction 



V o syth no un y 



Computer Science Division, U.C. Berkeley, Berkeley, CA 94720, USA 
daf@cs.berkeley.edu, 
http ; //www. cs .berkeley . edu/ ~daf 



Abstract. Our understanding of object recognition can address the 
needs of only the most stylised applications. There is no prospect of 
the automated motorcars of Dickmanns et al. knowing what is in front 
of them anytime soon; searchers for pictures of the pope kissing a baby 
must search on a combination of text, guesswork and patience; current 
vision based HCI research relies on highly structured backgrounds; and 
we may safely guess that the intelligence community is unlikely to be able 
to dispense with image analysts anytime soon. This volume contains a 
series of contributions that attack important problems in recognition. 



1 What We Do Well 

n solv som p o 1 ms th w 11. no tun t ly th s p o 1 ms s m to 
onn t only th t nuously w th pot nt 1 ppl t ons o o j t o - 
n t on. Th s s us 11 u nt 1 o thms oojt ontonll mo 1 
o n t on s t xplo t on o o spon n . h Ion s to sm 11 
num o typ s hwthh t st nvulms hvous. uho 
th s m t 1 s (o shoul ) ommon us n w 11 v w only v y 

fly- 



1.1 Geometric Detail for Point-like Primitives 

o nt-1 kpmtvspoj tlkpo nts; po nts p oj t to po nts 1 n s to 1 n s 
on s to on s t . w th no mo ompl x om t h v ou th n o lu- 
s on. Th s m ns th t thou h o j ts n look nt om nt v ws 

th s h n s h hly st u tu . o po nt-1 k p m t v s o spon n s 

twnm nojt tusn s h nv ous w ys o t 1 
om t mo Is ( s mpl o th s 1 1 tu n lu s 2 3 6 7 9 10 

11 18 19 23 2 25 ; th hun sopps Inwthv ntlo- 

thms o sp t hn 1 ssu s). Typ 1 1 o thms o th s 1 ss n usu lly 

n nst n s wn om sm 11 num oojt mo Is (som t m s om 

p m t mis) nst k oun o mo t lutt n sp t th 
ts o o lus on. o 1 ms n lu th st t on to x t om t y; th 1 m- 
t num o p m t V s; th ulty o u 1 n su nt pp op t mo Is; 

th n 1 un 1 1 ty o V t on n s m nt t on; nth st t on to 

sm 11 num o mo Is. 
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1.2 Some Cases of Curved Primitives 

Th ulty w th u V su s s th t thou h th h n o outl n w th 
V wpo nt s h hly st u tu th s st u tu s n lly ompl t — so om- 
pl t th t t 1 o s s lly unm n 1 . t 1 s known 

out p t ul s s non p t ul ly p t 1 t p s nt ; 11 th s n o m t on 
t n s to su st th t given a geometric class outl n s pow ul oust nt on 
om t y n V wpo nt. Th pp to 1 ss-sp onst nts on out- 

In s o 11 s s thou h only v y w us ul s s known. Th s p tu 
s ompl t y th s n o ny k n o ov nt pp ox m t on th o m. 

know no us ul om t th o ms out th outl n s o su s th t 1- 

most” Ion to p t ul 1 ss n 11 n t ons th t su h th o ms 

ult to t. 



1.3 Template Matching 

T mpl t m t h n wo ks th w 11 on som knso ontonpolm 
( n n ont 1 s s oo x mpl ; wh 1 som t Is n wo k 

out th p o 1 m s ut w 11 un stoo 1 8 1 20 21 16 ). u t mpl t 

m t h n h V s poo ly un h n o sp t n o Hum n t on ut th s 
n solv y opt np mt mlsot mpl t s ( . . 12 13 15 ). 
Th s pp o h u s o j ts to pp on th own n om s unw 1 y 

o 11 ut w so om. o mo ompl x s s mo Is n ons st 

o ompos tso p mtvs wh h th ms Iv s om p m t m 1 s o 

t mpl t s ( . . 1 5 8 1 17 22 ). 

2 What This Volume Describes 

Th utho s o th s p s two u t nt n s o utu s h on 

o j t ont on. Th s n shv nskth nthnxt two p p s. 
n th st o syth u s th t th m n u nt ulty n u 1 n pt- 
1 o n t on syst ms s th poo m n m nt o un t nty w th n thos 

syst ms; som so to ysn oms Ion ov u . t k s th pos t on 
th t m ny ult ssu s — o x mpl wh t s n pp op t p s nt t on 

o ptul stoojtso how shoul st n t u s us to om up 

w th s n 1 o n t on st t y — ss nt lly mp 1 n st t st 1 
n n tu n th t th ommun ty shoul tt mpt n to m st n pply 
V ous st t st 1 m tho s. n th s on un y u s th t mp 1 m th- 
o s no no un m nt 1 kth ou h ut th t tt un st n n o 
pp op t phys In o m 1 mo Is — o x mpl n un st n n o th 

1 t onsh p tw n th om t y o o j ts th su p op t s n th 
m pp n — m ht. Th st o th s volum ons sts o ont ut ons 

om 1 n s hs Inwthshpsh n oup n u nt t on 

n ont on. h s t on ont ns oth s hppsnppswth 

V w n nt o u to y m t 1. 
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2.1 Shape 

o th V st m jo ty o o n t on p o 1 ms sh p s n mpo t nt u . n 
Shape models and object reeognitionj' n on n oil u s s th 
u nt st t o th t n sh p mo 11 n . Th p p monst t s v ty 
o p s nt t ons o o n t on n In n th wo k n o th o 

n Is yin . Th y s p s nt tons nt ms o p ts” n show 

on w y to t non 1 ompos t on o n o j t nto p ts om n m 

n lly th p p s uss s how th ts o v w n t on th ms Iv s 

t y m si. 

Th 1 t onsh p tw n m m su m nts n o j t sh p s ompl - 

t . Th s It onsh p h s om n to 1 s w 11 s om t sp ts s t n 
Ison shows n Order structure, eorrespondence and shape based categories. ” 
Th o st u tu o oups o po nts o s not h n t ly s th oups 
p oj t to n m ; th s t n us to son out th o j t n 

V w. 

In uvs otnsujtto som o m o oup t on o th y 

snnnm . stn mhn sm o s ount n th s oup t on s 

to p s nt u V y plot o nv nt p op t s nst n nv nt p - 

m t . Th s pp o h usu lly u s th t on 1 to m su n n on- 

V n nt num o v t v s. n Quasi-invariant parametrisations and their 

applications in computer vision” to n poll show how to us u s - 
nv ntp mtstono psntn uvs. Th s h s th v nt th t 

w V t V s n m su on s w 11 n to pt som v t on n 

th p s nt t on. 

2.2 Shading 

Hum n t on s n mpo t nt sou othv tonnojtpp n.n 
Representations for reeognition under variable illumination,” K m n 1- 
hum un ohss v 11 th illumination cone wh h 

o s th pp n o n o j t un 11 poss 1 Hum n nts. Th y show 
th t th s onv X on n us to ons s sp t su st nt 1 sh n 

V t ons wh h on oun th usu 1 st t s. Th s th o y os not t t sh - 

ows wh h It w th n Shadows, shading, and projective ambiguity,” y 

Ihum u K m n n u 11 . j ts th t v th s m sh ow p tt n n 

X v w om t lly u v 1 nt up to generalised s- 1 ambiguity, 

u th mo o j ts n oust u t up to th s m u ty om mult pi 

m s un mult pi unknown 1 ht sou s. 

2.3 Grouping 

Grouping s th p o ss o ss m 1 n m ompon nts th t pp to Ion 

to th . Th V ty o sons th t ompon nts m y Ion to th — 

n Grouping in the normalized eut framework” 1 k n oil us s 

mhn sm th t s m nts n m nto ompon nts th t s t s y lo 1 
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00 n ss t on. Th m h n sm s tt tv us ( s th y show) t n 

us o V ty o nt oup n us n lu n nt ns ty t xtu 

mot on n ontou . 

noth son th t m ompon nts m y Ion to th s th t th y 1 

on th s m pi n n th wo 1 . n Geometric grouping of repeated elements 

within images^ h 1 tzky n ss m n show how to t m n wh n 

p tt n ons sts o pi n 1 m nts p t o n to v ty o ul s. Th y 

t m n th s 1 m nt o th p tt n n n th n oust u t th p tt n 

us p 1 1 on ul s on th pi n 1 to p 1 1 on ul s n th m n 
w 11 n w y. 

n s t 11 1 m ppl t ons oth th pos o th oun pi n n th 1- 
t on o th m usu lly known. Th s m ns th t on n t 11 wh th 

o j ts ly n on th oun pi n pp to h v symm t y s u w n n 
un y monst t n Constrained symmetry for change detection^ Th s u 
m k s t poss 1 to oup to th nt st n m nts wh h 1 k ly 

to h V om om o x mpl hum n t ts. 

n Grouping based on coupled diffusion mapsf o sm ns n n ool 
s th us o n sot op us on p o ss s o oup n . n th s pp o h 

p X Is onn t y us on p o ss th t s mo to p v nt smooth n 
ov 1 nts. Th y show x mpl s wh th s p o ss s us os m n- 

t t on o t t n symm t s o st os op oust u t on n o mot on 

onst u t on. 

2.4 Representation and Recognition 

n Integrating geometric and photometric information for image retrieval,” 
hm ssmnn oh s lolpsnttonom snt ms o 

interest points. Th s nt st po nts n us n photom t n o m t on; 

oil t ons o nt st po nts yl p snttonthtnopo tsnomton 

out sh p n out photom t y. Th s p s nt t on n us to m t h 

u y m s nst n m t s . Th y show how th p s nt t on n 
xt n to 3 u V s us n th os ul t n pi n o th u v to o t n 
mthnposstht nmthS uvs twn sm 11 s 1 n st op s. 

un y n x n s noth pp o h o nt t n photom t- 

n om t n o m t on n Towards the integration of geometric and 
appearance-based object recognition.” Th y p opos us n t mo Is o su - 
htn ss to n x o j t nt ty n omp n n s us n th n- 
N y mo 1 o su fl t n to 1 t . Th s pp o h s xt n to 

olou m s n Recognising objects using color annotated adjacency graphs,” 
yTu xn n tly. nthspp ojts psnt y j ny 

phs o olou s wh h mth usn phmth. Th p- 

p o h us s m tho om In 1 o omput n ph omp t Its 

th t s som wh t m n s nt o th m tho o \'k et al.. 

n A cooperating strategy for object recognition,” h 11 su n nt no 

nt V n 1 nt s ompl t o n t on syst m. j ts p- 

s nt us n th nt st po nts o s t symm t y t ns o m; th 
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syst m us s oop t n nts to m t th u lly mpo t nt It onsh p 
tw n top- own n ottom-up n o m t on flow. 



2.5 Statistics, Learning, and Recognition 

tho s om st t st s n oni st t st 11 n n th o y st t n to h v 
su st nt 1 mp t on th p t o omput v s on. 1 ss 1 st t st 1 
p o 1 m th t tu ns up n ni ny nt v s on ppl t ons s model selection 

— om wh h o s V 1 mo Iswsth tstotn Th s ssu must 
It w th n o n t on wh t s o t n known s verification — th ssu s do 

the pixels in this region come from an object or the background? — n st u tu 

om mot on {what kind of camera produced this scene?) n n v ty o 
oth s o V s on. To v ws th s top n Model selection for two view 

geometry: a review.” 

o syth on n o s p o 1 st 1 o thms o o j t o - 
n t on n Finding objects by grouping primitives.” Th s 1 o thms st u - 
tu oun th us o s mpl p m t v s stly p opl n n m Is p - 

s nt s yl n s n n th n oun y oup n p o ss th t ss m 1 s 
yl n s th t to th look Ik” p son; s on ly ol s n loth oun 

us n 1 ss th t o n z s th pp n n th n th kov h n 
ont lo m tho s us to oup th m nto ss m 1 s th t look Ik u kl 
p tt ns n loth n . t s usu lly ult to know wh ttous s pmtv n 
th s so t o wo k; th p p u s th t th t st t st 1 n n tu n 

th t p m t V s oul 1 n om t . 

mo t 1 omm ttm nt to 1 n n pp s n n Object Recognition 
with Gradient- Based Learning.” y L un n ottou n n o. m s 
ohn-wttnh ts It y sun o Itstv ous s Is 
n th It outputs p ss to n u 1 n t 1 ss . Not only th 1 ss 

ut th It s th ms Iv s In om x mpl t us n p o u 
known s gradient based learning, onvolut on 1 n u In two ks th n 
to th to y 1 space displacement network thtn sunohn- 

wttnh t s.nomton out th p o 1 st st u tu o h n w tt n 

num s s n o po t us n graph transformer network. 
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i pi r t ri w t t igni nt w kn in urr nt un- 

r t n ing of o j t r ognition. 1 k goo m for u ing unr li 1 

inform tion lik r iom tri m ur m nt tiv ly; int gr ting pot n- 

ti lly ontr i tory u ; r vi ing ypot int pr n of n w inform tion; 

t rmining pot nti 1 r pr nt tion from t ; n uppr ing in ivi u 1 if- 
f r n to o t in tr t 1 . pro 1 m r i ult ut non r 

un ppro 1 giv n ng of mp i in our r r . 

11 t import nt pro 1 m v t ti ti 1 fl vour to t m. Mo t involv 
ng of mp i from t t il tu y of p i u to n inv tig tion 
of t niqu for turning u into int gr t r pr nt tion . n p rti ul r 11 
V t ti ti 1 fl vour n n t oug t of inf r n pro 1 m . ow n 
mpl t t ugg tt tmto of yin inf rn n u tottk 

t i ulti . 

V 1 rg ly m pp out t t g om tri 1 m t o w n . imil rly 
11 t r iom tri inform tion t t on iv ly oul u ful Ir y it. 

li V t t t n t flow ring of u ful vi ion t ori will o ur w n w n- 

g g in n ggr iv tu y of t ti ti n pro ili ti mo lling p rti ul rly 

m t o of yin inf r n . 

1 What We Do Badly 

w kn in urr nt un r t n ing of o j tr ogntion 11 pp r to om 
from our on ption of n o j t mo 1 p iv un tru tur r po itory of 
t il g om tri inform tion. 



lit r tur i ri wit in ivi u 1 u to o j t i ntity from urf olour 
to g om tri primitiv . t i unu u 1 to v t u gr on nyt ing; t 
r ulting m rr m nt i voi y not omp ring t u or y ignoring 
u t m ur olour or t tur . i i riou mi t k . t i quit 

1 r t t it i g n r lly tt r to v mor u v n if om r mor r li 1 
t n ot r . 

Our u of olour n ing inform tion i till un omfort ly w k pro - 

ly u w on t V mo 1 t t n 1 (or ignor ) t n ty p y i 1 

t of int rr fl tion or olour 1 ing. m not w r of t u of t tur 

D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 9-21, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 
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in r ognition pt in t mpl t m t 

In it If to ir t t mpl t m t ing 
m pi ut it n ri 

tion. ignor t i inform tion pro 
for int gr ting it wit ot r u n 
mo 1 t t op wit in ivi u 1 v ri 



ing ppro . tur o n t r lly 

not V ry tig r v ry trip in t 
n t ription ont in inform - 

ly u w on t V m ni m 
u w V n t prop rly formul t 
on ( tr tion g in!). 



rforming r ognition t prop rly tr 1 1 v 1 (w r t ” p r on” or 

i y 1 ” i ion i m for t pi ig t or m nuf tur r r known) 

i om t ing w know not ing out. know from omput r in t t 
i r r i r t prop r w y to 1 rg num r of o j t i ntly n 

t r i om vi n 20 21 t t p opl m k i r nt kin of i ion out 

0 j t i ntity ( t gory-1 v 1” n in t n -1 v 1” i rimin tion) ugg ting 

t pr n of i r r y. n turn t i ugg t t t tr tion my Ip wit 

i n y. 

11 urr nt r ognition Igorit m p rform r ognition t t Ivlofini- 

vi u 1 in t n in 11 t mo 1 ; ny t gory-1 v 1 r ognition i p rform 

1 t r. i 1 to t r t ri ti lu i rou ly in i nt r t roug mo - 

1 . mpli itizing t i r y u ing g om tri inv ri nt imply r t logg 

t 1 ; it o not olv t pro 1 m u it o not 1 . 

3 

r i u pi iou profu ion of w it o j t on 1 k kgroun in t 

urr nt r ognition lit r tur ( igni nt v ri nt in lu multi u o j t on 

gr n kgroun n 1 k o j t on w it kgroun ). i i pro u t 

of t vi w t t gm nt tion n r ognition r i tin t pro 1 m n t t 

w n t rly vi ion ommunity g t roun to olving gm nt tion v ryt ing 
will ok. vi w i p rni iou n t pr ti it n our g r u iou . 
r ognition ytmt toprt t nillvlof tr tion woul r w 
ro g n ri i tin tion tw n o j t t r t; knowl g of t kin of 

i tin tion i pr i ly w t t t rting t g of gm nt tion n . n ot r 
wor o j t r gm nt inwol utyrojt;w oul rig 

t gm nt tion pro to n t kin of vi n t t 1 to t o j t 
w r looking for. 

i vi w of gm nt tion i u u 1 in pr ti t i i w t int r t op r- 
tor lin n oni r 11 out ut ploring t full pow r of t vi w 
n t n y u w v only t v gu t not ion of w t t vi n 
oul . ol -f ion vi w i t t o j t oul mo 11 om- 

po it of primitiv t t rm p rt” i oft n u n t t n ing im g 

vi n of t pr n of t primitiv i w t gm nt tion i out 2 . 

primitiv mig t p ( n t / / on/ t . t ) ut 

t y mig t 1 o r t ri ti ing or olour v nt . tr ngt of t i 

vi w i t t on n imly ow to uil n i 1 r ognition y t m lik 
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t i on nil r p t gm nt tion pro t t n v ry g n r 1 wi ly 

u primitiv rtnt n mlt primitiv into 1 rg r n 1 rg r r - 
gion of ini g vi n . gm nt tion i pr ti 1 u at each stage, 

we know what we ’re looking for. g in t i i ol n w ut i Uy nil 
n w for mpl now t fu out pp r n r ognition i 

own t t ommunity i moving v ry qui kly in t i ir tion 9 pr i ly 
u it i t only w y to 1 wit gm nt tion n ompl mo 1 

w kn it t w on t know w t t primitiv r w t t mo 1 

look lik or ow to nil t mo 1 . 



t woul goo t ingifourr ognition y t m oul r ogni m ny o j t 

n if t r ognition pro i not r quir u t nti 1 r ngin ring tim 
n w o j t w n ount r . lly mo 1 of n w o j t oul o t in 

y owing t y t m v riou in t n in v riou vi w . i i not n rily 
n rgum nt for statistieal 1 rning t ory; t prop rti r v n uilt 

into g om tri r oning y t m 22 It oug oing o g n r lly r quir om 

form of impli it t ti ti 1 r oning. 

1 tion of m ur m nt ro ly int rpr t i poorly un r too . 

ypi 1 urr nt r ognition y t m will u om t of g om tri 1 m ur - 

m nt to t rmin w t r n o j t i pr nt; of our t r will It r- 

n tiv t of m ur m nt t t oul u . i on oul w u n 
w y? i i n rrow v r ion of t pro 1 m of primitiv w i primitiv 
oul w u in r ognition n w y? ot pro 1 m pp r in tr t to 

mo 1 1 tion pro 1 m . 

u 

t t of V ri tion i i gr ; it r w oul n t o it (w i oul 

pr f rr on t groun t t w r only oing it uw vnt ml 
mu of t V il 1 vi n for our ypot i for t ting it) or w oul 

o it prop rly. urr nt pr ti of ounting g point n ro ing ng r 
imply i n t goo noug . pro 1 mi on of ypot i t ting (or of i r t 
mo 1 1 tion) to w t gr o t im g vi n in t i r gion upport 

t following o j t ypot i ? i i f irly tr ig tforw r pro 1 m wit 

r 1 tiv ly tr ig tforw r t ti ti 1 olution mo 1 1 tion g in. 

2 A Brief Sketch of Bayesian Inference 

ro ility provi m ni m for omp ring in t n wit oil tiv of 
pr viou mpl ; t i m ni m m k it po i 1 to om in vi n from 
V riou our . pro of omp ri on i t roug pro ili ti mo 1 of 
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t m ur m nt pro u giv n t t t of t worl . or mpl w mig t 

t k 1 rg num r of pi tur of un t n tim t 

P{ ig r lo |pi tur i un t) 

y fr qu n y tim t . i num r oul int rpr t it r t t m nt 
of p t fr qu n i or gr of li f” t t ig r lo will pp r 

in pi tur of un t. t giv t pro ility of m ur m nt giv nt t t 

of t worl ; t i t rm i oft n r f rr to t conditional or likelihood. 

m in u for pro ili ti mo 1 in r ognition i inf r n . n r lly 
w p t to V pro ili ti mo 1 of t t t of t worl for w 
m ur it. i i t prior in t mpl t pro ility t t pi tur 

t k n from our oil tion i pi tur of un t or 

P(pi tur i un t) 

y i n p ilo op y y t t 11 our knowl g of t t t of t worl 

i n p ul t in t posterior t pro ility of worl t t giv no r- 

V tion . po t rior ount for t t of o rv tion on t pro ility 

t t (in our mpl ) pi tur t k n from our oil tion i pi tur of un t. 

y y rul t po t rior i proportion 1 to t pro u t of t prior n t 

lik li 00 o w V 

P(pi tur i un t| ig r lo oc P{ ig r lo |pi tur i un t) 

xP(pi tur i un t) 

Mo t omput r vi ion r r r v n t i pr ion m ny tim wit out 
ny gr t lig t. t gr t import n i t t generative mo 1 w i giv 
t w y t t t i pro u giv n t t t of t worl n turn 

into r ognition mo 1 ju t y multiplying y t prior. On ommon o j tion 
t t prior n r itr ry i t ink mpty; 11 ut t illi t oi 

of prior r ov rw Im y t in t kin of pro 1 m w wi to olv . 

r 1 pro Imit twn to Ito omput wit t r ulting po t rior 
n t t i w r i ulti ri . 

u 

impl r 1 tion ip tw n g n r tiv mo 1 n inf r n it ttr - 
tion of yin mo 1 . f on pt t i ypot i of t yin 
p ilo op y t t t po t rior n p ul t our knowl g of t worl 
inform tion int gr tion i impl . On ju t form t po t rior orr pon ing 

to t m ur m nt v il 1 . o ition 1 ompli tion pp r u t 

g n r tiv mo 1 (or lik li oo w i ri t pro t t 1 tot 

m ur m nt ) i u u lly y to o t in. 

n t following pi o Mun ymkt tttrino non- 

i 1 tru tur of 1 tr tion n o no i for ling wit t worl 

tnylvlot rt nt tofintn . imywll oun p ilo op y 
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ut it i impr ti 1 y t m r it tur . v nt g of 1 tr tion i 

i lly t t on kin of m ur m nt no multipl uty. u it i wort 

n ing t n r gion wit nrprllli utr r n wful lot 

of t ing t t look r t r lik ylin r. 1 irri pprtoor 
i nt r pr nt tion y m king r 1 tiv ly m 11 num r of i ion to i ntify 
r 1 tiv ly 1 rg num r of o j t . f t i promi n rli t fttt 
t i r r y i rti i 1 i irr 1 v nt. 

ro ili ti mo 1 implify on tru ting 1 irri u t y n 

n o pli itly t v ri tion tw n in t n of 1 . i yi 1 n imm - 

i t olution to n ol pro 1 m wit g om tri primitiv it i r to prov 
nyt ing u ful out o j t t t r not exactly in t n of t primitiv 

1 ut V ry only lig tly from in t n . olution i to uil lik li oo 
mo 1 roun t m ur m nt. u for mpl if w r looking for um n 

lim w n n t from t f t t t t outlin r not only not tr ig t 

n prill utt wyt tt y i r from ing tr ig t n p r 11 1 i 

tru tur . 

ro ili ti mo 1 m y 1 o Ip to t rmin ppropri t primitiv . o 

not gr wit Mun y rgum nt t t primitiv ompo ition i or v n 
oul noni 1. t m mor on tru tiv to vi w ompo ition into 

p rt or primitiv onv ni n for r pr nt tion 1 i n y. n t i vi w 

t i tin tiv prop rti of primitiv r t t t y o ur on m ny o j t in 
imil r form ; t t t ir pr n i u ful gui to t pr n of n o j t; 

n t t t ir pr n 1 to i tin tiv im g prop rti to gui inf r n . 

pp r to m to t ti ti 1 rit ri . 

11 1 omm nt o r om op for r tion 1 1 ory of gm nt tion if 

we can extract information from posteriors easily. 



ttr tion of t y i n vi w i t t (giv n nt g n r tiv mo 1) 

11 pro 1 m wit int gr ting inform tion i pp r. Of our t tri ky it i 
tr ting inform tion from t po t rior n t i t n to t pro 1 m 
in t p t. t i V ry y to t up pro 1 m w r t prior n t lik li- 

00 11 pp r impl n y 1 1 po t rior i v ry r to n 1 (t olour 

on t n y mpl giv n low i of t i kin ). i ion pro 1 m r too ig n 
too i or rly to u onjug t i tri ution ( n ol -f ion o g of u ing 

mo 1 w r t prior lik li oo n po t rior 11 turn out to v n y” 
form). mig t i to oo worl mo It t m imi t po t rior 

ut ow o w g 1 1 i m imum? y i n gm nt t ion u ing M rkov r n om 

1 foun r on t i point. 

urr ntly f ion 1 vi w in t t ti ti 1 ommunity y t t infor- 
m tion n tr t from t po t rior y r wing 1 rg num r of m- 
pl from t t i tri ution. u for mpl if w w nt to i w t r to 
low om t ing up or not on im g vi n w woul form t po t rior 

r w mpl from t i omput n p t utility for lowing it up n for 
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not lowing it up y v r ging t utiliti for t mpl n oo t 

option wit t 1 rg r p t utility^. 

r wing mpl from po t rior i not t 11 y- M rkov in Mont 

rlo m t o pp r to t n w r. typi 1 Igorit m i t M tropoli - 
ting Igorit m w i woul pro u in t i qu n of ypot 

y t king n ypot i Ti n propo ing r vi v r ion Jt . n w y- 

pot i Ti+i i it r Ti or p n ing (r n omly) on ow mu tt r t 

po t rior o i t wit Tj i . On u i nt it r tion v ompl t 11 

u qu nt Ti r mpl r wn from t po t rior; t num r of it r tion 

r quir to i v t i i oft n 11 t burn in tim . mpl m y 

or m y not orr It ; if t i orr 1 tion i low t m t o i i to mix 
w 11. t i known ow to pply t i Igorit m to in w o om in of upport 
i ompli t (for mpl t num r of ypot m y not known a 
priori) 7 . 

M tropoli - ting Igorit m oul vi w kin of oup up 

ypot iz n t t pro . propo r pr nt tion of t worl n 

pt or r j tit on t po t rior; our r pr nt tion oft posteriori n 

on i t of 1 rg t of pt propo 1 . i vi w ju ti u ing urr nt 
vi ion Igorit m our of propo 1 . ru i 1 improv m nt i t t 

w n u i r nt in omp ti 1 Igorit m i tin t our of propo 1 

n t mpl w o t in r pr nt t po t rior in orpor ting 11 v il 1 

m ur m nt . mpl in tion 3 illu tr t t i ppro in gr t r 

t il. 

r r om riou Igorit mi pro 1 m it i not po i 1 tot Hr li ly 

w t r in urnt in y looking 1 1 mpl t in pro u ; in 
n mi tr m ly lowly n oft n o if not v ry r fully ign 10 ; n 

t i r n tw n u ful Igorit m n t trop i f ilur r t 

on t propo 1 pro . t mpling or oupling from t p t i (not 

t rri ly pr ti 1) m t o for ling wit t r t pro 1 m 19 17 ; t ot r 

two r not going w y nytim oon. v nt g of r pr nting m iguity 

n rror pli itly pp r to outw ig t i ulti . 

3 An Example: Colour Constancy by Sampling 

olour on t n y i goo impl mpl t t om of t fl vour of 

r ognition int n ttw ru ing mo 1 to m k inf r n from 

im g o rv tion . n t impl t v r ion w r in worl of fl t front 1 

urf wo i u r fl t n long to low im n ion 1 lin r f mily 

illumin t y olour our . o not know t olour of t our n 

wi to t rmin urf olour w t v r t our olour. pro 1 m n 

1 om if V r olv tly. 

r r m ny i r nt m ni m of w i giv i r nt ti- 

m t . t i u u 1 to um t t t illumin nt ng lowly ov r p ; u 



^ Right now, it would have to be a fairly slow target; but computers get faster. 
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in lu n umption of on t nt v r g urf olour 3 lit up 

of r ptor p to w i t urf m p 1 t pr n of rp ng 

in im g rig tn 13 t f t t t p ul riti typi lly t k t our 
olour 12 14 n p y i 1 on tr int on r 11 t n n /or illumin nt 4 . 
11 of t u r i lly V li n 11 oul u .Ltugo kto 
t of olour on t n y. woul lik to u t following on tr int 

llumin nt v ry only lowly ov r p 
p ul riti yi 1 u to urf olour. 

llumin nt nrfltn r r wn from nit im n ion 1 lin r f mi- 
li . 

fl t n r V ryw r ov 0.012 n low 0.96 in v lu (t i giv 

n 0 1 yn mi r ng t w v 1 ngt on i t nt wit ot r m t o 

n p y i 1 vi n 
llumin nt r V ryw r po itiv . 

ignor vrgrfltn ty will ov r y t prior. 

3 

mo 1 urf r fl t n um of i fun tion 4>j{X) n um 

t t r fl t n r pi wi on t nt 

Us 

s{x,y,X) Y. 

3=0 

r aj{x,y) r t of o i nt t t v ry ov r p 

mo 1; in t mpl w ri t y r on t nt in 

t gri g r not known in v n . 

imil rly w mo 1 illumin nt um of (po i ly i 

ipi n um t 1 1 p ti 1 V ri tion i giv n y t pr 
our po ition t . i u ompon nt u to t 

Tie 

ed{x,y,X, ) d{x,y, 

i=0 

w r r t o i nt of i fun tion n d{x, y, ) i g in t rm 

t t r pr nt t ng in rig tn of t our ov r t r vi w . 
p ul r ompon nt u to t our i 

Tie 

y ■) ) TYi{x^y. ) ^ ^ ('^) 

i=0 

w r m{x, y, ) i g in t rm t t r pr nt t ng in p ul r ompon nt 
ov r t r vi w . 



or ing to om 
gri of o w r 

r nt) i fun tion 
n of ingl point 
our i 
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t n r on i r tion yi 1 mo 1 of t k t r ptor r pon 

Pk{x,y) I s{x,y,X){ed{x,y,\, ) + e^(x,y,A, ))pk{X)dX 
d{x, y, ) E {x,y) + m{x,y, ) hjkej 

W r gijk J pk{X)'ipi{X)(f)j{X)dX n hik J pk{X)'ipi{X)dX. n t i 
aj{x,y) i pi wi on t nt wit om p ti 1 mo 1 in w t follow w 

um t t it i pi wi on t nt on gri but we do not know what the grid 

edges are. p ti 1 mo 1 for t illumin nt follow from t point our 

mo 1 w r it po ition of t our n m{x, y, ) i o t in u ing 

ong mo 1 of p ul riti . 

00 uniform prior for rfltn o int. pt illumin nt 

to V no rom ti i n o u u i n prior wo m n i w it ; w 

How f irly u t nti 1 t n r vi tion to How for illumin nt t t r 
olour . 

u t g n r tiv mo 1 i 

mpl t num r of r fl t n t p in x n iny {kx n ky r p tiv ly); 
now mpl t po ition of t t p ( a: n r p tiv ly); 
for til mpl t r fl t n for t t int rv 1 from t prior (ct™ for 
t m t til ; 



mpl t 


illumin 


nt 0 i nt 


from t prior: 


mpl t 


illumin 


nt po ition 


from t prior; 


n r n 


r t im 


g ing 


u i n noi . 


0 W V 


lik li 00 







n 



F(im g \kx,ky, x, ) 

po t rior i proportion 1 to 

P(im g \kx,ky, X, y.(jJ-,ei, ) x Prior {kvx) Prior {kvy) 

X Prior { x)Priar{ y) 

X Prior{a^) 

m til 

X Prior (ci) Prior { ) 

11 w V to o i r w mpl from t i . 



3 

mpling pro i tr ig tforw r M M . ropo 1 mov r of four 

typ 
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t t il r in mpl 1 of 7 wit t ption 

t t w riv propo 1 i tri ution for t po ition of t t p from t 

im g gr i nt o t t w r mor lik ly to propo t p w r t 
gr i nt i ig . 

in mpl 1 of 7 . 

in mpl 1 of 7 . 

fl u t i om u tl pitf 11 . t i 

t mpting to illumin tion n ngrfltn tn rfltn n 
ng illumin tion u ot t p will involv mpling u i n 
w i i y. i it turn out i i u t in mov 

tr m ly lowly if w o t i . pi n tion i quit impl ; giv n r - 

fi t n /illumin tion p iring t t i quit goo m 11 ng 1 to ug 

in r in t rror. u t pro will 1 to r fl t n t t work 
w 11 wit t urr nt illumin nt or n illumin nt t t work w 11 wit t 

urr nt r fl t n n will mov tr m ly lowly, n t w u m t o 

u to 1 1 t t uppr r n om w Ik y joining mom ntum v ri- 
1 n t n mo lling t t t t po ition of p rti 1 in n n rgy 

1 ; typi lly t t t mov to po ition of 1 rg po t rior v lu qui kly 

t w i point w t row w y t mom ntum v ri 1 . 



3 3 X X 



ow om r 


ult 0 t in u 


ing 


r 


1 t 


t from . i t w 


p 0- 


togr p wit 


m r 


t n 


i 


pi y 


on 


p otogr p 


from 


t t r n wit 


Im m r 


u j 


t 


to unknown printing pro 


n 


t n nn from t pu li 


P P 


r; 


u 


u 


tt r t li 


r t ly 


to illu tr t t 


pot nti 1 pow 


r of t 


pro 


I 


r no p ul r 


ompo- 


n nt in t i 


t t m king t 


P 


ul 


r r 


oning 


impl . 0 t in 


i 


u ing t m 


ni m of M rimont n 




n 


11 16 . 


on tr int on o 


i nt 


w r tim t 


u ing gr p i 


1 m t 


0 










n 


t p ti 1 mo 


Ipl 




g 


1 1 


rig t point ; t i i 


r ly 


urpri ing 


olumn n row 


g 


om ig 


ontr t point . 


igur 1 


ow tt 1 


• plot of r fl t 


n 


mpl 


tim t 


for V riou orr 


pon - 


ing til for im 


g 0 t in un r 


i 


r nt 


olour 


illumin nt un 


r t 



umption t t im g w ompl t ly in p n nt. in t mpl li 

in r on ly lo group omp r wit t r ptor r pon group t 1- 

gorit mi i pi ying on t n y. mpl ont in not only tim t of 

r fl t n ut 1 o information about a reasonable range of solutions. i i 
ru i 1 n pow rful. 

urr nt olour on t n y Igorit m nnot 1 wit prior knowl g out 

t worl . i Igorit m n. or mpl on i r t t of knowing t t 

til i in n im g un r w it lig t i t m ” til j in n im g o t in 
un r purpl lig t. t i quit pi u i 1 to w nt to know t i knowing t t 
n o j t i u n n oul t our r port on it olour. n our r pr - 

nt tion w n p rform t i 1 ul tion y resampling t t of mpl . 
mpl p ir of r pr nt tion on o t in un r t w it lig t t ot r 
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2.5 




un r t purpl lig t o t t p ir w r t two r fl t n r imil r 
r r pr nt mor oft n (t i n 11 m form 1). igur 2 ow t 
t on r ult ; t i inform tion r u un rt inty. urt r t il 

pp r in 6 . 

4 How Inference Methods Can Help Address Onr 
Problems 

ro ill ti mo 1 r i ult to u n to t up n t r r no r li 1 
Igorit m for n ling pro ili ti mo 1 on t It t vi ion will r quir . 

p r igm i t rig t on ; w 1 rn to u g om tri mo 1 w 11 n w 
oul now ir ting our ort tow r u ing pro ili ti mo 1 w 11. 



il mpling Igorit m r urr ntly low n i ult to uil w 11 t 
V nt g t t omp n t for t i r 

g t goo r pr nt tion of 11 t on lu ion t t nr on ly 

r wn from t t (i. . t po t rior). 
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T w 

u u 

u u 



T 

u u 
u 



fl 



u resampled 



t i y to ow to in lu ot r form of inform tion in t r oning 
pro ; r writ t lik li oo mor propo 1 m ni m n pro 

f form of on ition 1 in p n n ppli (w i it o in u ful ) 
w nr mpl n i ting t of mpl . 

o not n to n on urr nt Igorit m to i v t i ; in t w 
n lo k t m in pro ility n u t m propo 1 pro ( wit 
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1 Introduction 

This paper discusses some problems that should be addressed by future object 
recognition systems. 

In particular, there are things that we know how to do today, for example: 

1. Computing the pose of a free-form three-dimensional object from its outline 
(e.g. [106]). 

2. Identifying a polyhedral object from point and line features found in an 
image (e.g., [46, 89]). 

3. Recognizing a solid of revolution from its outline (e.g., [59]). 

4. Identifying a face with a fixed pose in a photograph (e.g., [10, 111]). 

There are, however, things that we do not know how to do today, for example: 

1. Assembling local evidence into global image descriptions (grouping) and us- 
ing those to separate objects of interest from the background (segmentation). 

2. Recognizing objects at the category level: instead of simply identifying Bar- 
ney in a photograph, recognize that he is a dinosaur. 

This is of course a bit of an exageration: there is a rich body of work on 
grouping and segmentation, ranging from classical models of these processes in 
human vision (e.g., [65, 116]) to the ever growing number of computer vision 
approaches to edge detection and linking, region merging and splitting, etc., 
(see any recent textbook, e.g., [38, 73] for surveys). Likewise, almost twenty 
years ago, ACRONYM recognized parameterized models of planes in overhead 
images of airports [17], and the recent system described in [33] can retrieve 
pictures that contain horses from a large image database. Still, segmentation 
algorithms capable of supporting reliable recognition in the presence of clutter 
are not available today, and there is no consensus as to what constitutes a good 
representation/recognition scheme for object categories. 

This paper examines some of these issues, concentrating on the role of shape 
representation in recognition. We first illustrate some of the capabilities of cur- 
rent approaches, then lament about their limitations, and finally discuss current 
work aimed at overcoming (or at least better understanding) some of these lim- 
itations. 

* This work was partially supported by the National Science Foundation under grant 
IRI-9634312 and by the Beckman Institute at the University of Illinois at Urbana- 
Champaign. M. Cepeda is now with Qualcomm, Inc. and S. Sullivan is now with 
Industrial Light and Magic. 
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2 The State of the Art and Its Limitations 

Let us start with an example drawn from our own work to illustrate the capa- 
bilities and limitations of today’s recognition technology. While it can certainly 
be argued that more powerful approaches already exist (and we will indeed dis- 
cuss alternatives in a little while), this will help us articulate some of the issues 
mentioned in the introduction. 



2.1 An Example of What Can Be Done Today 

Here we demonstrate that the pose of a free-form surface can be reliably esti- 
mated from its outline in a photograph. Two obvious challenges in this task are 
(1) constructing a model of the free-form surface that is appropriate for pose 
estimation and (2) computing the six pose parameters of the object despite the 
absence of any three-dimensional information. 

We have developed a method for constructing polynomial spline models of 
solid shapes with unknown topology from the silhouette information contained 
in a few registered photographs [106]. Our approach does not require special- 
purpose hardware. Instead, the modeled object is set in front of a calibration 
chart and photographed from various viewpoints. The pictures are registered 
using classical calibration methods [110], and the intersection of the visual cones 
associated with the photographs [9] is used to construct a G^-continuous trian- 
gular spline [22, 30, 60, 100] that captures the topology and rough shape of the 
modeled object. This approximation is then refined by deforming the spline to 
minimize the true distance to the rays bounding the visual cones [108]. Figure 
l(a)-(b) illustrates this process with an example. 

The same optimization procedure allows us to estimate the pose of a modeled 
object from its silhouette extracted from a single image. This time, the shape 
parameters are held constant, while the camera position and orientation are 
modified until the average distance between the visual rays associated with the 
image silhouette and the spline model is minimized. In fact, the residual distance 
at convergence can be used to discriminate between competing object models in 
recognition tasks. Figure l(c)-(d) shows some examples. But.. 

2.2 Is This Really Recognition? 

Of course not: we have relied on an oracle to tell us which pieces of contours 
belong to which object, since the spline representation does not provide any sup- 
port for top-down grouping (as shown by Fig. 1(d), occlusion is not the prob- 
lem). Although it is possible that some bottom- up process (e.g., edge detection 
and linking, maybe followed by the elimination of small gaps and short contour 
pieces and the detection of corners and T-junctions) would yield appropriate 
contour segments, this is not very likely in the presence of textured surfaces and 
background clutter. Indeed, the contours used as input to the pose estimation 
algorithm in Fig. 1 were selected manually [107]. 
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Fig. 1. Automated modeling and pose estimation: (a) nine views of a (toy) dinosaur; 
(b) the corresponding spline model, Gouraud-shaded and texture-mapped; (c) and (d) 
results of pose estimation. 
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The Barney (or maybe T. Rex in this case) vs. generic dinosaur problem 
mentioned earlier is also apparent here: the spline is a purely numerical object 
description, and it is hard to imagine how it would support the recognition of 
object classes. Another possible objection to this approach is the lack of support 
for modelbase indexing, but this may not be that bad of a problem: after all, 
the cost of matching every model to the given contour data is only linear in the 
size 771 of the database (in our example, m = 3, a rather small value). 

All this does not mean that the spline models of Fig. 1 are not useful in 
practice: indeed, they provide a low-cost alternative to using, say, a Cyberware 
rangefinder to construct detailed graphical models of three-dimensional objects. 
Likewise, pose estimates could be used in pick-and-place robotic manipulation 
of isolated^ objects presented on a dark background. However, it is pretty clear 
that this approach does not hold the key to constructing the recognition systems 
of the future. The next section discusses alternatives. 



2.3 What Else Then? 

Numerical/ combinatorial methods use geometric filters to identify a sufficient 
number of matches between image and model features. They include alignment 
techniques (e.g., [46, 62, 89, 112]) and affine and projective invariants (see, for 
example, [26, 115] for early applications of invariants to object recognition, and 
[69, 70] for recent collections of papers). In the former case, matching proceeds 
as a tree search, whose potentially exponential cost is controlled by using the 
fact that very few point matches are in fact sufficient to completely determine 
the object pose and predict the image positions of any further matches (see 
[4, 31, 36] for related work). In the latter case, small groups of points are used 
to directly compute a feature vector independent of the viewpoint (hence the 
name of invariant) that, in turn, can be used for indexing a hash table storing 
all models. An advantage of invariants is that indexing can be achieved in sub- 
linear time. A disadvantage is that groups of three-dimensional points in general 
position do not yield invariants [20, 23, 68] (but see [32, 59, 120] for certain 
object classes that do admit invariants). 

Alignment- and invariant-based approaches to object recognition do not re- 
quire (in principle) a separate grouping stage that constructs a global description 
of the image and/or its features: instead, it is in principle sufficient to construct 
a reliable local feature detector (a much easier task) , since spurious features will 
be rejected by the matching process. On the other hand, as in the case of spline 
models, it is not clear at all how these techniques, with their purely numerical 
characterization of shape, would handle object categories. 

Appearance-based approaches are related to pattern recognition methods. They 
record a description of all images of each object, and they have been successfully 
used in face identification [10, 111] and three-dimensional object recognition 

^ But possibly quite complex (at least by computer vision standards): have another 
look at the gargoyle and the dinosaur in Fig. 1. 
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[71]. Their main virtue is that, unlike purely geometric approaches to recogni- 
tion, they exploit the great discriminant power of image intensity information. 
However, it is not clear how they would generalize to category-level recognition 
(sec, however, [8] for preliminary efforts in that direction), and they normally 
require (due to their essentially global nature) a separate segmentation step that 
distinguishes the objects of interest from the image background. For example, 
the three-dimensional object recognition system described in [71] uses images 
where isolated objects lie in front of a dark background. Despite its very impres- 
sive performance (real-time recognition of complex free-form, textured objects 
from a single photograph), it is unlikely that this system would perform as well 
with occlusion and background clutter. 

It should be noted that the recognition method proposed in [94] combines 
the advantages of invariant- and appearance-based techniques: it uses quasi- 
invariant properties [15] of the projection process to faithfully represent all 
images of an object by a small set of views, and relies on local descriptions 
{jets, see [54]) of the brightness pattern at interest points [40] for efficient in- 
dexing. Explicit segmentation is avoided, and the results are excellent (e.g., a 
dinosaur (again!) is easily recognized in a picture that contains a very cluttered 
forest background). Like other appearance-based techniques (e.g., [Ill]), this 
one is sensitive to illumination changes, although recent advances in color anal- 
ysis [41, 102] suggest that combining the jets used in the current version of the 
system with local illumination invariants may solve the problem. Again, this 
approach does not address the problem of representing object classes. 

Structural representations. An alternative to the techniques discussed so far is to 
describe objects by part-whole networks of stylized primitives, such as general- 
ized cylinders [13] or superquadrics [5, 77]. This approach has several (potential) 
advantages: first, it offers a natural representation for object classes (similar ob- 
jects will hopefully have similar part-whole decompositions). Second, and maybe 
not quite as obviously, appropriate primitives may prove useful in guiding top- 
down grouping and segmentation: e.g., to look for cylinder- like structures in an 
image, look for pairs of curves that are more or less straight and parallel to each 
other [63]. There are also some known difficulties: for example, the images of 
a particular class of primitive shapes may not have simple properties that can 
readily be exploited by a top-down segmentation process. Another very difficult 
problem is to precisely and operationally define what constitutes an object part. 

A different type of structural representation is what might be called a weak 
modeling scheme, i.e., an approach where only very general assumptions are 
made about the world, and general mathematical results valid under these as- 
sumptions are used to parse images. Aspect graphs [52, 53] are an example of 
this approach: they exploit the fact that both the structure of image contours 
and the manner in which it changes with viewpoint are mathematically well 
understood. Hopefully, similar objects will have similar aspect graphs, and the 
understanding of image structure may serve as a guide to image segmentation. In 
practice however, aspect graphs have not fullfilled their promise, partly because 
of the great difficulty of reliably extracting contour features such as termina- 
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tions and T-junctions from real images, and partly because of the fact that even 
relatively simple objects may have extremely complicated aspect graphs (e.g., 
a polyhedron with n faces has an orthographic aspect graph with 0{n^) cells 
[35]; the situation gets even worse when perspective projection [105] and curved 
objects [78] are considered). 

Still, at this point we believe that structural approaches to recognition offer 
the best hope of tackling the segmentation and class representation problems, 
so we will revisit in the next two sections both generalized cylinders and aspect 
graphs (the latter in the context of evolving shape), and discuss some new twists 
that we are currently exploring. Before that, let us stress that we do not claim 
that generalized cylinders or aspect graphs are the way to go. Nor do we claim 
that the results presented in the next two sections are particularly impressive 
or an improvement over existing recognition technology. Rather, we believe that 
it is important to assess what these representation schemes really have to offer 
since we do not know of viable alternatives at this time. 

3 Generalized Cylinders 

Binford introduced generalized cylinders (GCs) in a famous unpublished 1971 
paper [13], defining them in terms of “generalized translational invariance”. 
Roughly speaking, a GC is generated by a one-dimensional set of cross-sections 
that smoothly deform into one another. Binford noted that a space curve form- 
ing the spine of the representation may be defined (but not always uniquely, e.g., 
a cylinder), and that “in general, we don’t expect to have analytic descriptions 
of the cross-section valued function, or the space curve called the spine” [13]. 

Such a definition is very general and quite appealing, but it is also very 
difficult to operationalize: in other words, although a great many objects (the 
fingers of my left hand for example) can certainly be described by sweeping 
smoothly-deforming cross-sections along some space curve, it is not clear at all 
how to construct the description of a given shape in a principled way. 

Most of the early attempts at extracting GC descriptions from images fo- 
cused on range data [1, 74, 103]. Among those, the work of Nevatia and Binford 
[74] is particularly noteworthy since it does implement a version of generalized 
translational invariance: their algorithm tries all possible cross-section orienta- 
tions of objects such as dolls, horses, and snakes, then selects subsets of the 
cross-section candidates with smoothly varying parameters. 

Methods for finding GC instances in video images have traditionally been 
based on the assumption that three-dimensional GCs will project onto two- 
dimensional ones, or ribbons: this is the approach used in the ACRONYM system 
of Brooks and Binford [17, 18] for example. For the limited class of GCs (circular 
cylinders and cones) and ribbons (trapezoids and ellipses) used by ACRONYM’S 
geometric reasoning system,^ there is indeed a natural correspondence between 
GC projections and ribbons, and the assumption is justified. 

^ ACRONYM employed a wider repertoire of primitives for geometric modeling, but 
not to draw inferences from images [17]. 
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The situation is not as clear for more general GC and ribbon classes, and 
this is probably one of the reasons why, following ACRONYM, most vision sys- 
tems using GCs as their primary shape representation moved toward simpler, 
and better understood, parametric and globally generative GC classes: Shafer 
and Kanade [97, 98] paved the way by introducing a taxonomy of generalized 
cylinders, of which the most commonly used today are probably straight ho- 
mogeneous generalized cylinders (SHGCs, also called generalized cones [44, 66] 
and constructed by scaling an arbitrarily-shaped planar cross-section along a 
straight spine) and solids of revolution (or RCSHGCs in Shafer’s and Kanade’s 
terminology). Another example of a sub-class of GCs is formed by geons, a set 
of twenty four GC types proposed by Biederman in the context of human vision 
[12], which have recently found use in machine vision [11, 25, 75]. 

Limiting the class of GCs under consideration makes it possible to predict 
viewpoint-independent properties of their projections [72, 83, 81]: for example, 
Nalwa proved that the silhouette of a solid of revolution observed under ortho- 
graphic projection is bi-laterally symmetric [72] (i.e, it is an instance of straight 
Brooks ribbon [90]). Ponce, Chelberg and Mann showed that, under both ortho- 
graphic and perspective projection, the tangents to the silhouette of an SHGC 
at points corresponding to the same cross-section intersect on the image of the 
SHGC’s axis [84]. Other important problems (e.g., whether a shape admits a 
unique SHGC description [80]) can also be addressed in this context. 

Such analytical predictions provide a rigorous basis for finding individual GC 
instances in images [45, 84, 86, 93, 113, 118, 119] or recognizing GC instances 
based on projective invariants [59], and, indeed, very impressive results have 
been achieved: for example, the system implemented by Zerroug and Medioni is 
capable of automatically constructing a part-based description of a teapot from 
a real image with clutter and textured background [118]. 

Despite these undeniable successes, we believe that it is necessary to go be- 
yond simple sub-classes of GCs: most objects around us are not made up of 
instances of solids of revolution, SHGCs, canal surfaces, etc.: we do not live in 
a world made of glasses, bottles, cups and teapots. Restricting the application 
domain is useful, but in the end we must interpret scenes that contain the fa- 
miliar objects in my office, stapler, scissors, telephone and people, who may (or 
may not) have elongated parts, but cannot be described in terms of a small set 
of rigid primitives. This is our motivation for introducing a new breed of GCs in 
the next sections. 



3.1 Approach 

The notion of generalized cylinder proposed in this section is based on the intu- 
ition that the elongated parts of a shape are naturally described by sets of cross- 
sections whose area is as small as possible. These cross-sections correspond to 
valleys of a height function that measures the area of all possible cross-sections 
of the shape. The following definition of the topographic notions of valleys (or 
ravines) and ridges was given by Haralick, Watson, and Laffey [37, 39] in the 
context of image processing. 
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Definition 1. The valley (resp. ridge) of the surface defined in by a height 
function h : U d IR^ IR over a 2D domain U is the locus of the points 
where the gradient Vh of h is an eigenvector of the Hessian H, and where the 
eigenvalue associated with the other eigenvector of the Hessian is positive (resp. 
negative). 

Other definitions of ridges and valleys have been given by various researchers, 
including Crowley and Parker [24] and Eberly, Gardner, Morse, Pizer and Schar- 
lach [27]. As shown by Koenderink [55], the definitions of Haralick et al. and 
Eberly et al. are in fact equivalent to a much earlier one proposed by de Saint- 
Venant in 1852 [92]. Koenderink’s paper also includes a modern account of the 
different 1912 definition due to Rothe [91] that better captures the intuitive 
notion that water should flow down the valley and not cross the course of a 
river, but unfortunately does not afford a local criterion for detecting ridges and 
valleys. 

We will use the definition of Haralick et al. since, as shown in Section 3.2, it 
can be used to derive a local geometric condition for a cross-section of a shape 
to participate in a ribbon or a generalized cylinder. It also has the advantage of 
having been implemented in several ridge and valley finders [34, 39]. Finally, as 
shown by Barraquand [6, 7] and detailed in Section 3.3, it is readily generalizable 
to higher dimensions, where it still yields one-dimensional valleys and ridges. 
This will prove particularly important for defining GCs. 



3.2 Ribbons 

Gonsider a 2D shape bounded by a curve T defined by x : / ^ IR^ and param- 
eterized by arc length. The line segment joining any two points x(si) and x{s 2 ) 
on r defines a cross-section of the shape, with length l{si, S 2 ) = |a;(si) — x(s 2 )|- 
We can thus reduce the problem of studying the set of cross-sections of our shape 
to the problem of studying the topography of the surface S associated with the 
height function h: P ^ IR^ defined by h{si,S 2 ) = ^l{si,S 2 )^. 

The lowest (resp. highest) points of S correspond to places where the region 
bounded by F is the narrowest (resp. the widest). More interestingly, the valleys 
(resp. ridges) of this surface correspond to narrow (resp. wide) subsets of the 
region, which leads to the following definition of ribbons. 

Definition 2. The ribbon associated with the shape bounded by some curve F is 
the set of cross-sections whose end-points correspond to ridges and valleys of the 
associated surface S. The narrow (resp. widej ribbon is the subset of the ribbon 
corresponding to a valley (resp. ridge) of S. 

At this point, let us make a few remarks: 

• The narrow ribbons are of course of primary interest for implementing two- 
dimensional GGs. However, we will see in the next section that wide ribbons also 
capture interesting shape properties, such as certain symmetries. 

• The description of a given shape is uniquely defined: in particular, it is 
obviously independent of the choice of arc-length origin. In fact, as shown in 
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[37], the valleys of a surfaee are invariant through monotonie transformations of 
the height function (so replacing h by I for example would not change the ribbon 
associated with a shape). 

• Although this representation superficially “looks like” a skeleton [95] or 
medial axis [16], it is fundamentally different, since a shape is described by a 
set of line segments instead of a set of disks. This is indeed an instance of 2D 
generalized cylinder [13]. 

• As noted earlier, using valleys and ridges to define ribbons allows us to 
construct the ribbon description of a given shape by constructing a discrete 
version of the surface S, then using some implementation of a valley /ridge- finder 
[34, 39] to finds its valleys and ridges. 

Figure 2 shows the narrow ribbons found in synthetic and real images using 
two very simple valley finders that we have developed to conduct preliminary 
experiments. The top of the figure shows results from the first implementation 
with, from left to right: the ribbon of a worm-like object, the mid-points of its 
cross-sections (spine), another synthetic example, and the silhouette of a person 
and the spine of its narrow ribbons. The bottom part of the figure shows results 
from our second program with, from left to right: the height function associated 
with a bottle shape, the ribbon cross-sections, and the associated spine. 





Fig. 2. The narrow ribbons extracted from synthetic and real images. 



Formal properties. We first give a geometric criterion that two points must satisfy 
to define a ribbon pair. Let us parameterize the curve by arc-length, and define 
the unit vectors u, v forming an orthonormal right-handed coordinate system, 
such that xi — X 2 = lu (Fig. 3). We denote by L and rii the unit tangent and 
normals m Xi (i = 1,2), and by 9i denote the angle between the vectors u and 
ti- 
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Fig. 3. Notation. 



Lemma 1. The ribbon associated with a two-dimensional shape is the set of 
cross-sections of this shape whose endpoints satisfy 

{cos^9i — cos^ 92)cos{9i — 92) + I cos 9i cos 02 («^i sin 01 + k 2 sin 02 ) = 0. (1) 

This lemma follows from the definition of ridges and valleys and from the 
fact that the gradient V/i is an eigenvector of the Hessian Tt if and only if it is 
a non-zero vector and 

{nS7h) X V/i = 0, (2) 

where “x” denotes the operator associating to two vectors the determinant of 
their coordinates. Rewriting this condition in geometric terms yields (1) after 
some algebraic manipulation. It should be noted that the eigenvalues of the 
Hessian can also be expressed geometrically in terms of the distance I, the angles 
01, 02 and the curvatures k\,K 2 - The formula, however, is a bit complicated and 
not particularly illuminating, and it is omitted here. 

More interestingly. Lemma 1 allows us to compare our ribbons with various 
other types of symmetries. 

Lemma 2. Radial symmetries, bi-lateral symmetries, and worms are ribbons. In 
radial symmetries, concave pairs of symmetric points form narrow ribbons, and 
convex pairs form wide ribbons. Concave pairs of points in bi-lateral symmetries 
always form narrow ribbons. Worms are narrow ribbons. 

Lemma 2 shows that ribbons include some interesting shape classes (worms 
are ribbons obtained by sweeping a line segment with constant width perpen- 
dicular to some generating curve). For example, an ellipse admits the following 
description in terms of ribbons: one narrow ribbon corresponding to its ma- 
jor axis of symmetry, and two wide ribbons corresponding to its minor axis of 
symmetry and its radial symmetry. 

The lemma follows easily from (1), the formula for the Hessian’s eigenvalues 
mentioned earlier, and the angle and curvature properties of worms and radial or 
bi-lateral symmetries [80]. Note that (1) also provides another method for finding 
ribbons: construct a discrete version of the surface S, and find the zero-crossings 
of(l). 
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Part decomposition. As mentioned earlier, parts can be defined as the pieces of 
an object which are well approximated by some primitive shape [77] , as the pieces 
of an object which are separated by some prototypical discontinuities [43, 85], 
or as a combination of both [48, 101]. Here we follow Binford’s suggestion and 
decompose complex shapes into ribbon parts whose cross-sections vary smoothly, 
so that cross-section area discontinuities delimit separate parts. 

Figure 4 shows examples using real images. The cross-section discontinuities 
are defined as valley endpoints as well as points where the width function is 
discontinuous. 




Fig. 4. Parts. Prom left to right: a hand, its parts, a person and his parts. 



Let us stress again that we do not claim that the results shown in Figs. 2 
and 4 are better than those that would have been obtained by previous methods 
[43, 48, 101]. Nor do we claim at this point to have a robust program for part 
decomposition. Rather, we believe that our preliminary results demonstrate the 
feasibility of the approach and pave the way toward achieving part decomposition 
for three-dimensional objects, as discussed in the next section. 

3.3 Generalized Cylinders 

Definition. The definition of valleys presented in Section 3.1 has been extended 
by Barraquand and his colleagues [6] to arbitrary dimensions, with applications 
in robotic planning [7]. The generalization is as follows: the valley (resp. ridge) 
of a hypersurface defined in IR”^^ by a height function h : U C IR" ^ IR over an 
n-dimensional domain U is the locus of the points where the gradient of h is an 
eigenvector of the Hessian, and where the eigenvalues associated with the other 
eigenvectors of the Hessian are positive (resp. negative). (Mixed eigenvalues yield 
topographic entities lying somewhere between valleys and ridges.) 

Barraquand has shown that valleys are (in general) one-dimensional, inde- 
pendent of the value of n. This is intuitively obvious since the gradient Vfi is 
an eigenvector of the Hessian TC when the vectors Vfi and T-CVh are parallel to 
each other. This can be expressed by a system of n — 1 equations in n unknowns, 
defining a curve in the domain U. 

For example, in the case n = 3, the condition is again (2) where “x” denotes 
this time the cross product. This vector condition yields two independent scalar 
equations in the three domain parameters, hence a curve. 
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We now show how the valleys of a height function defined over a three- 
dimensional domain can be used to define generalized cylinders. In the three- 
dimensional case, there is no natural parameterization of the cross-sections of a 
shape in terms of points on its boundary. Instead, we propose the following idea: 
consider a volume V and the three-parameter family of all planes P(si, S 2 , S 3 ) 
in IR^ (we will not worry about the choice of the plane parameterization for the 
time being). We consider the hyper-surface S of IR^ associated with the height 
function h(si, S2, S3) = Area(R n F(si, S2, S3)). 

In other words, h measures the area of the slice of the volume V contained 
in the plane P. The set of valleys of the surface S is as before a one-dimensional 
set, and we can associate with each point in a valley the slice of V associated 
with the corresponding plane. This yields a description of V in terms of planar 
cross-sections swept and deformed in space. 

Definition 3. The generalized cylinder associated with a shape is the set of 
cross-sections corresponding to the valleys of the associated surface S. 

Time for a few more remarks: 

• The definition of a GC given above is not quite satisfying: it depends 
on the choice of plane parameterization used in defining the height function h. 
Changing the parameterization will not merely scale the values of h like the 
monotonic transformations mentioned in Section 3.2: it will deform the domain 
over which the surface S is defined and thus change its valleys. This difficulty 
stems from the fact that there is no obvious natural parameterization similar to 
arc length in the three-dimensional case. 

We propose to use a local parameterization of planes in the neighborhood of 
a given cross-section to guarantee that the GC description attached to a given 
shape is uniquely defined: we attach to each cross-section a reference frame 
whose origin is its center of mass and whose x, y axes are its inertia axes. This 
allows us to parameterize each plane in a neighborhood of the cross-section by 
the spherical coordinates of its normal and the signed distance to the origin. In 
turn, we can use this parameterization and the local criterion (2) to determine 
potential GC cross-sections of an object (of course, the sign of the eigenvalues 
of Pi must also be checked). The GC description of a given shape is obviously 
uniquely defined, independent of any global coordinate system. It is in fact easy 
to show that it is also independent of the particular choice of the coordinate axes 
in the cross-section plane. Note however that the representation (i.e., the height 
field and its valleys) depends on the particular local plane parameterization 
chosen, e.g., we could have used Cartesian coordinates instead of spherical ones 
for the normal. Likewise, we could have used a different origin for the coordinate 
frame associated to each cross-section: the only objective reason for choosing 
the center of mass is that it defines an origin uniquely attached to the shape. 
Clearly, more research is needed in this area, and we will investigate alternate 
plane parameterizations. 

• In the three-dimensional case, we do not make the distinction between 
“narrow” and “wide” generalized cylinders because it is not intuitively clear what 
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the ridges of S would correspond too. Also, remember that mixed eigenvalue 
signs in the three-dimensional case may give rise to intermediate topographical 
entities whose properties are at this point poorly understood. 

• As in the two-dimensional case, the “spine” of our representation is the 
one-dimensional set of cross-sections, not a particular curve in three-dimensional 
space. If such a curve is desired, the locus of centers of mass of the cross-sections is 
a natural choice, given the parameterization proposed above. This is in agreement 
with Binford’s remarks quoted earlier [13]. 

Where do we stand? We are still at a very early stage of our research for three- 
dimensional shapes. At this point, we have shown the following lemma. 

Lemma 3. Solids of revolution are generalized cylinders. 

The proof is given in [21]. Briefly, it computes analytically the area of the 
cross-section of a solid of revolution by an arbitrary plane, then goes on to show 
that cross-sections orthogonal to the axis direction are valley points. We plan to 
investigate other class of GCs such as circular tubes (three-dimensional worms), 
SHGCs and canal surfaces (the envelopes of one-dimensional families of spheres) . 

There are other properties of our GCs that we plan to investigate in the near 
future: 

• Stability: we plan to explore both theoretically and empirically the sta- 
bility of the representation. In particular, it would be desirable that the GC 
descriptions of two similar shapes also be similar. Characterizing the stability 
of any shape representation scheme is difficult (defining shape similarity is in 
itself a non-trivial problem). An instance of this problem is whether the GC 
description of a shape that can be approximated by a simple GC class (e.g., a 
solid of revolution) has a GC description similar to an instance of that class (in 
the solid of revolution example, roughly circular cross-sections swept along an 
almost straight line). As noted several times before, this is very important since, 
except in very special situations, real objects and/or their parts will not be exact 
instances of simple primitive shapes [42] . 

• Projective properties: do our generalized cylinders project onto ribbons? 
This is at present an open question, and we will try to give it a rigorous answer. 
Understanding the stability of the GC representation will provide partial answers 
(e.g., solids of revolution project to bi-lateral symmetries under weak perspective 
[72]). We will of course try to articulate more complete ones. 

We have implememted a simple program that uses the marching cube al- 
gorithm [61] to detect valleys in a three-dimensional terrain, and Fig. 5 shows 
very preliminary experiments. Note that we have not implemented the local pa- 
rameterization of GCs, and our program uses a global plane parameterization 
instead. Much more work is of course needed in this area. 

Once we have developed robust algorithms for constructing GC descriptions 
of complex shapes, we will attack the part-decomposition problem using an ap- 
proach similar to the one developed in the two-dimensional case. More generally, 
the ultimate goal of this project is of course to develop methods using part-whole 
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/ 

Fig. 5. Some simple shapes and their GC description. 

hierarchies describing objects and object classes for efficient indexing in object 
recognition [14, 74]. 

Another interesting issue is how to introduce a notion of scale in the repre- 
sentation. This would allows us to maintain hierarchies of part decompositions 
potentially useful in coarse-to-fine matching. At this point this remains an open 
problem. A different application of scale-space techniques to shape representa- 
tion is considered in the next section. 

4 Evolving Surfaces 

Imagine a smooth density function defined over a volume. The set of points 
where the density exceeds a given threshold defines a solid shape whose surface 
is a level set of the density. Blurring the density function will change its level 
sets and the shape it defines. When do “important” changes occur? This is the 
question addressed by Koenderink in the section dedicated to dynamic shape of 
his book [51], where he goes on to give several examples of structural changes 
under the name of morphological scripts. Recently, Bruce, Giblin and Tari [19] 
have given a complete classification of the structural changes in the parabolic set 
of a smooth surface undergoing a one-parameter family of deformations, as well 
as geometric and analytical conditions under which these transitions happen. 
We address in this section the problem of actually computing these transitions. 

4.1 Background 

The idea of capturing the significant changes in the structure of a geometric 
pattern as this pattern evolves under some family of deformations has a long 
history in computer vision. For example, Marr [65] advocated constructing the 
primal sketch representation of an image by keeping track of the patterns that do 
not change as the image is blurred at a variety of scales. The idea of recording 
the image changes under blurring lead to the scale-space approach to image 
analysis, which was first proposed by Witkin [117] in the case of inflections of a 
one-dimensional signal, and has since been applied to many domains, including 
the evolution of curvature patterns on curves [3, 58, 64] and surfaces [82], and 
more recently, the evolution of curves and level sets under diffusion processes 
[49, 96]. 
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Koenderink’s and Van Doom’s aspect graphs [52, 53] provide another exam- 
ple where a geometric pattern can be characterized by its significant changes. 
This time, the objective is to characterize the possible appearances of an object. 
The range of viewpoints is partitioned into maximal regions where the (qualita- 
tive) appearance of an object remains the same, separated by critical boundaries 
where the structure of the silhouette changes according to some visual event. The 
maximal regions (labeled by the object appearance at some sample point) are 
the nodes of the aspect graph, and the visual events separating them are its arcs. 
For smooth surfaces, a complete catalogue of visual events is available from sin- 
gularity theory [2, 52, 53, 87], and it has been used to compute the aspect graphs 
of surfaces of revolution [28, 56], algebraic surfaces [79, 88], and empirical sur- 
faces defined as the level set of volumetric density functions [76] (see [109] for 
related work). 

In his book [51], Koenderink addressed the problem of understanding the 
structural changes of the latter type of surfaces as the density function undergoes 
a diffusion process. He focused on the evolution of certain surface attributes, 
namely, the parabolic curves and their images via the Gauss map, which are 
significant for vision applications: for example, the intersection of a parabolic 
curve with the occluding contour of an object yields an inflection of the silhouette 
[50], and the asymptotic directions along the parabolic curves form the lip and 
beak-to-beak events of the aspect graph [47]. Koenderink proposed to define 
morphological scripts that record the possible transformations of a given shape 
and use these as a language for describing dynamic shape. Bruce, Giblin and Tari 
[19] have used singularity theory to expand Koenderink’s work and establish a 
complete catalogue of the singularities of the parabolic set under one-parameter 
families of deformations. We recall their results before presenting an approach 
to computing the critical events that they have identified. Once these events 
have been computed, we characterize the structure of the parabolic set and its 
Gaussian image at a sample point between each pair of critical events, which 
yields a data structure similar to the aspect graph but parameterized by time 
instead of viewpoint. 



4.2 Singularities of Evolving Surfaces 

This section introduces the parabolic set, its Gaussian image, and their singular- 
ities in the context of contact between planes and surfaces. It also summarizes 
the results of Bruce, Giblin and Tari [19]. 

Generic singularities. The intersection of a surface with the tangent plane at 
one of its points can take several forms (Fig. 6): for most points, this intersection 
is either reduced to the point itself (this is the case for elliptic points, where the 
surface has locally the shape of an ovoid or the inside of an egg shell, see Fig. 
6(a)) or composed of two curve branches intersecting transversally (this happens 
at hyperbolic points, where the surface has locally the shape of a saddle, see Fig. 
6(b)). Elsewhere, the intersection may consist of a curve that cusps at the point 
of interest (which is then said to be parabolic, see Fig. 6(c)), a unode (a double 
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extremum of the height funetion measured along the surfaee normal, see Fig. 
6(d)), or a tacnode (two smooth eurve branches joining tangentially, see Fig. 
6(e)). Points corresponding to the last two eases are ealled godrons, ruffles [51], 
or cusps of the Gauss map, beeause the curve traced on the Gauss sphere by the 
unit surface normals along the parabolic curve has a cusp at these points. 

For generic surfaces, there are no other possibilities; elliptie and hyperbolic 
points form extended areas of the surface, separated by smooth curves made 
of parabolic points; the cusps of Gauss are isolated points on these parabolic 
curves. 







Fig. 6. Contact of a surface with its tangent plane: (a) an elliptic point, (b) a hyper- 
bolic point, (c) a parabolic point, (d) a unode, (e) a tacnode. 



Singularities of one-parameter families of surfaces. As shown in [19], there are a 
few more possibilities in the case of one-parameter families of deforming surfaces: 
indeed, three types of higher-order contact may occur at isolated values of the 
parameter eontrolling the deformation. The corresponding singularities are ealled 
A3, A4 and D4 transitions following Arnold’s conventions [2], and they affect 
the strueture of the parabolic set as well as its image under the Gauss map (Fig. 

7 ). 

There are four types of A3 transitions: in the first case, a parabolic loop 
disappears from the surface, and an assoeiated loop with two cusps disappears 
on the Gauss sphere (this corresponds to a lip event in catastrophe theory jargon, 
see Fig. 7(a)). In the second case, two smooth parabolic branches join, then split 
again into two branches with a different connectivity, while two cusping branches 
on the Gauss sphere merge then split again into two smooth branches (this is a 
beak-to-beak event, see Fig. 7(b)). Two additional singularities are obtained by 
reversing these transitions. 

At an A4 event, the parabolic curve remains smooth but its Gaussian image 
undergoes a swallowtail transition, i.e., it acquires a higher-order singularity that 
breaks off into two cusps and a crossing (Fig. 7(c)). Again, the transition may 
be reversed. Finally, there are four D4 transitions. In the first one (Fig. 7(d)), 
two branches of a parabolic curve meet then split again immediately; a similar 
phenomenon occurs on the Gauss sphere, with a cusp of Gauss “jumping” from 
one branch to the other. In the second transition (Fig. 7(e)), a parabolic loop 
shrinks to a point then expands again into a loop; on the Gauss sphere, a loop 
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Fig. 7. The singularities of evolving surfaces. The events are shown on both the surface 
(left) and the Gauss sphere (right). The actual events are shown as black disks, and 
the (generic) cusps of Gauss are shown as white disks. See text for more details. 



with three cusps shrinks to a point then reappears. The transitions can as usual 
be reversed. 



4.3 Computing the Singularities 

We now present an approach to computing the singularities of evolving algebraic 
surfaces. We first derive the equations defining these singularities, then propose 
an algorithm for solving these equations and finally present results from our 
implementation. 



Singularity equations. Bruce, Giblin and Tari [19] give explicit equations for all 
the singularities in the case of a surface defined by a height function 2 : = f{x, y). 
Here we are interested in surfaces defined by some implicit equation F{x, y, z) = 
0. Although it is possible to use the chain rule to rederive the corresponding 
equations in this case, this is a complex process [109] and we have chosen instead 
to first construct the equations characterizing the parabolic curves and the cusps 
of Gauss, which turn out to have a very simple form, then exploit the fact that 
the parabolic set or its image under the Gauss map are singular at critical events 
to characterize these events. 

The normal to the surface defined by F{x, y,z) = 0 is of course given by the 
gradient VF = {Fx, Fy, Fz)'^ . As shown in [79, 109, 114], the parabolic curves 
are defined by F{x,y,z) = P{x,y,z) = 0, where where P = VF'^AVF and A 
denotes the symmetric matrix 



F F — F 

yy-^- zz ^ yz 



P P — P P P P — P P ' 

xz-‘- yz ^ zz^ xy xy-‘- yz ^ yy-*- xz 

F F — F^ 

J- ZZ-L XX -L XZ 



^ FxyFyz FyyFxZ FxyFxZ FxxFyz FxxFyy Fxy 
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(Note that A7i = |'H|Id, where T-L denotes the Hessian matrix associated to 

F.) 

The cusps of Gauss are points where the asymptotic direction along the 
parabolic curve is tangent to this curve [51]. Let us show that the asymptotic 
direction at a parabolic point is a = AVF. Asymptotic tangents are vectors of 
the tangent plane that are self-conjugate. The first condition is obviously satisfied 
at a parabolic point since a • VF ~ VF'^AVF = 0. The second condition is also 
obviously satisfied since a^Jfa = \H\{a. ■ VF) = 0. Since the tangent to the 
parabolic curve is given by VP x VF, it follows that the cusps of Gauss are 
given by P = P = C = 0, where C = VP^AVF. 

Note that VP has a relatively simple form: 

g /VP^AVP\ 

VP= — (VP^ylVP) = VF'^AyVF +2\n\VF, 

\VF'^A,VF ) 

since (^VP^)^ = HA = 061(7^)^. In particular, this simplifies the expression 
of C since the VP term above cancels in the dot product. Similar simplifications 
occur during the computation of the non-generic singularities below. 

We are now ready to characterize these singularities. Note that in the case of 
a surface undergoing a family of deformations parameterized by some variable t, 
P, P and C are also functions of t. Let us first consider A3 and D4 transitions. 
Since they yield singular parabolic curves, they must satisfy P = P = 0 and 

VxP(x, y, z, t) X V^P{x, y, z, t) = 0, 

where Vx denotes the gradient operator with respect to x, and the third equa- 
tion simply states that the normals to the original surface and the “parabolic” 
surface defined by P = 0 are parallel. This is a vector equation with three scalar 
components, but only two of these components are linearly independent. It fol- 
lows that the singularities of the parabolic set are characterized by four equations 
in four unknowns. 

The case of A4 singularities is a little more complicated since the parabolic 
set is smooth there. On the other hand, the curve defined in by the cusps 
of Gauss is singular, and the A4 singularities can thus be found by solving the 
system of equations defined byP = P= C' = 0 and 

|VxP,VxP,VxC’| =0. 

Solving the equations. Our objectives are to compute all the critical events and 
characterize the structure of the parabolic set and its Gaussian image at a sample 
point between each pair of critical events. This yields a data structure similar 
to the aspect graph but parameterized by time instead of viewpoint. 

All critical events are characterized by systems of four polynomial equations 
in four unknowns. They can thus be found using homotopy continuation [67], a 
global root finder that finds all the solutions (real or complex) of square systems 
of polynomial equations. 
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Between singularities, the structure of the parabolic set does not change, 
and the curve tracing algorithm proposed in [57, 79] is used to determine its 
structure. Briefly, the algorithm traces an algebraic curve by first using homo- 
topy continuation to find all its extremal points (including singularities) in some 
direction, as well as sample points on the smooth branches bounded by those 
extrema, then marching numerically from the samples to the adjacent extrema. 
See [57, 79] for details. 

We have implemented this approach. All algebraic manipulations, including 
the computation of the result of Gaussian diffusion and linear morphing pro- 
cesses and the derivation of the singularity equations have been implemented in 
Mathematica. The singularities are computed using a parallel implementation 
of homotopy continuation [104] which allows the construction of the continua- 
tion paths to be distributed among a network of Unix workstations. The curve 
tracing algorithm described in [57, 79] has been implemented in Mathematica. 

Figure 8(a) shows the results of applying Gaussian diffusion to a dimple- 
shaped quartic surface defined by 

(4x^ + 3y2)2 _ 4 j ;2 _ 5y2 + 4^2 _ ^ ^ 

The surface and its parabolic curves are shown in the first row of the figure, 
and the corresponding Gaussian image is shown in the second one. As before, 
generic cusps of Gauss are shown as white discs. Singularities occur in the second, 
fourth and sixth column. There is a last singularity which is not shown in the 
figure, and that corresponds to the disappearance of the surface. 

Figure 8(b) shows the singularities found when linearly morphing the dimple 
into a squash-shaped surface defined by 

Ay'^ + 3xy^ — 5y^ + Az^ + — 2xy + 2x + 3y — 1 = 0. 

The evolving surface is shown in the first two rows, and its Gaussian image in 
the two bootom ones. As in the previous case, we have not found any A4 event. 



4.4 Toward a Scale-Space Aspect Graph 

We show in this section preliminary results in an effort to characterize the change 
in visual appearance of an object as it undergoes a diffusion process. Approaches 
to the construction of such a scale-space aspect graph can be found in [29] in the 
two-dimensional polygonal case and [99] in the three-dimensional polyhedral 
case. Here we attack the case of curved surfaces formed by the zero set of poly- 
nomial density functions, focusing on the case of solids of revolution. Equations 
for the visual events of solids of revolution can be found in [28, 56]. These visual 
events form parallels of constant latitude on the unit sphere. When the diffusion 
parameter is added, the events transform into curves in (ct, /?) space, where /3 
denotes the angle between the axis of the solid of revolution and the viewing 
direction. 

Computing the singularities of these curves for a sample solid of revolution 
yields the scale-space aspect graph shown in Fig. 9. Note that multi-local visual 
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Fig. 8. Evolving shapes; (a) diffusion of a dimple; (b) morphing the dimple into a 
squash. 



events have not been traced yet, so aspects that only differ through T-junctions 
are considered equivalent. Figure 9(top) shows the regions of the {a, (3) plane 
delimited by visual events. There are two “diffusion events” (dashed vertical lines 
in the figure) in this case: the first one corresponds to the surface splitting into 
two parts, and the second one corresponds to the disappearance of its parabolic 
lines. A close-up of the diagram near these two events is shown in Fig. 9(top- 
right).^ Note that the visual event curve with a vertical inflection switches from 
beak-to-beak to lip at the inflection. 

The two diffusion events separate three qualitatively different aspect graphs. 
The aspects and visual events corresponding to the first one are shown in Fig. 
9 (middle). Close-ups of the aspects and visual events associated with the second 
one are shown in Fig. 9 (bottom). These are centered at the corresponding sin- 

® There are in fact two more diffusion events that occur for larger values of a and are 
not shown in the figure. They correspond to the disappearance of the two parts of 
the surface after splitting. 






(E) (F) (G) (H) 




(AB) (BC) (CD) 

Fig. 9. Scale-space aspect graph. See text for details. 



gularities for clarity. Ignoring multi-local events, there is only one aspect for the 
third aspect graph, and it is not shown in the figure. 
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As in the case of generalized cylinders, much more work is needed here: this 
includes computing the multi-local visual events, generalizing our approach to 
algebraic surfaces that do not bound solids of revolution, and linking the changes 
in aspect graph structure to the singularities of evolving parabolic and flecnodal 
curves. An even more challenging problem is to consider the effect of diffusion 
processes applied to the image instead of the surface, which is a much more 
realistic model of optical blur. A lot is left to learn. 
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1 Introduction 

The problem of computing visual correspondence is central in many applications 
in computer vision but it is notoriously difficult compared to the ease at which we 
solve it perceptually when looking at images. Given two images of the same ob- 
ject from different viewpoints or even two different instances of the same object 
category, humans are in general able to establish a point to point correspondence 
between the images. In the example such as fig 1 there is no “correct” geometric 
answer to the correspondence problem. This raises the question of what rules 
actually govern the establishment of correspondence. One alternative would be 
that the objects in the images are recognised as birds and point correspon- 
dence is established based on correspondence of part primitives such as head, 
throat etc. which in general have invariant relations for the category of birds. 
Another alternative however, would be to assume that correspondence is estab- 
lished based on some general similarity measure of image shapes, independent 
of the fact that the images are recognised as birds. In that case, correspondence 
can be used as a basis for recognition and categorisation instead of the other way 
around. The two alternatives can easily be seen to be connected to alternative 
theories of object recognition and categorisation that have been proposed over 
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Fig. 1. By looking at pictures of instances of the same category of objects we 
are in general able to establish point to point correspondence 



the years: 1. Recognition by components (RBC), where objects are described as 
composed of generic parts that are recognised in the image (Marr and Nishihara 
1978, Biederman 1985) and 2. Recognition based on similarity of views (Blthoff 
et al. 1995, Tarr et al. 1995) which can form a basis for object categorisation 
(Edelman 1998). In this paper we will take the second point of view and define 
a general shape concept that can be used in the definition and computation of 
correspondence. 

Historically the problem of computing correspondence between deformed 
shapes has been formulated as finding the minimum of a cost function defined 
on the two shapes, (Burr 1981, Yoshida and Sakoe 1982). For a recent review see 
(Basri et. al 1998). This is most often limited to the case when the shapes are 
described by their outlines so that the matching is a problem of curve matching. 
The need to a priori extract the outlines of the shapes limits the applicability 
of these methods. The definition of the cost function and the complexity of the 
matching are other problems that have to be faced. 

The approach we will propose does not require any pre segmentation of shape 
outlines but works for arbitrary 2-D shapes. The input data to the algorithm 
will consist of groups of image coordinates with an associated tangent direction 
that can be obtained with extremely simple means. The input data structure has 
been simplified deliberately in order to avoid any complex perceptual grouping 
stage. 

2 Order Structure 

Order structure can be seen as the generalisation of the concept of ordering in 
one dimension to several dimensions. (Goodman and Pollack 1983, Bjorner et al. 
1993). Order properties can be defined for point sets and other algebraically de- 
fined sets of features in arbitrary dimension. In (Carlsson 1996) the concept was 
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used for sets of line segments in an algorithm for partial view invariant recog- 
nition of simple categories. It is also closely related to the idea used to describe 
qualitative difference of views in the concept of the aspect graph (Koenderink 
and van Doom 1979). For computer vision problems, geometric structure can 
be described in a unified stratified way ranging from metric, affine and projec- 
tive to that of order and incidence structure (Carlsson 1997). This can also be 
seen as going from quantitative detailed descriptions to qualitative and general 
ones which is necessary if we want to describe equivalence of instances of object 
categories or different viewpoints of an object. 

Any structure concept defines equivalence classes of geometric shapes. Order 
structure seems to imply equivalence classes that are in accordance with those 
that are subjectively reported by humans which make it especially interesting 
to use for problems of visual recognition and categorisation. 



2.1 Capturing Perceptual Shape Equivalence 

If we look at the sequence of deformed shapes A-F in fig 2 we see that they are 
easily divided into two qualitatively different classes by the fact that the shape 
changes between C and D from a pure convex to a shape with a concavity. The 
concept of order structure can be used to capture the perceptual equivalence 
classes of shapes A-C and D-G respectively. If we sample the shapes D and F at 
five points and draw the tangents at those points, fig. 3, the qualitative structure 
of combined arrangement of points and lines can be seen to be in agreement in 
the sense that there is a one to one correspondence between the points of the 
two shapes such that the intersection of corresponding tangent lines are ordered 
relative to each other and to the points in exactly the same way. This relative 
ordering can be given a formal treatment using the concept of order structure 
for the arrangement of points and lines. The same order structure of points and 




A B CD E F G 



Fig. 2. Sequence of successive deformations 



tangents can be obtained by sampling any of shapes D-F at five points. Note 
that the sampling points are in general in perceptual correspondence between 
different shapes. It is not possible to obtain the same order structure by sampling 
shaped A, B or C however, due to the fact that they are convex and lacks the 
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Fig. 3. Equivalent order structure from points and tangent lines shapes D and 
G 

concavity of shapes D-F. The fact that subsets of points and tangent lines can 
be found with the same order structure is a potential tool for classifying shapes 
into equivalence classes and for establishing point to point correspondence. Note 
that there are other subsets of five points and tangents that have the same order 
structure through all of the shapes A - F. Fig. 4 illustrates that given perceptually 
corresponding points of two shapes we can get equivalent order structure based 
on points and tangent lines. 

2.2 Point Sets 

The order structure of a set of points and lines can be used to define equivalence 
classes of shapes whose members are qualitatively or topologically similar, i.e. 
they can be obtained from each other by a deformation that preserves order 
structure. Order structure is a natural extension of the concepts of affine and 
projective structure in the sense that all affine and projective transformations 
that preserve orientation also preserve order structure. Order structure preserv- 
ing transformations or deformations are much larger than the classes of affine 
and projective transformation however. For a set of points in that plane, order 
structure is denoted order type. The order structure defining property for point 
sets is that of orientation. Three points 1,2,3 have positive orientation if travers- 
ing them in order means anti-clockwise rotation. The orientation is negative if 
the rotation is clockwise. This can be established formally by computing the sign 
of the determinant: 



sign[pi p2 ps] 



( 1 ) 
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Fig. 4. Equivalent order structure from points and tangent lines 



of the oriented homogeneous coordinates: 




For an arbitrary set of points, the order type is uniquely determined by the set 
of mappings: 



Xp{iij,k) = sign[pipjpk] {-1,1 (3) 

for all points i,j,k in the set. The various order types that exist for 3,4 and 
5 points are shown in fig. 5. Note that for five points we get unique canonical 
orderings of the points for the first two order types. For the third one, ordering 
is ambiguous up to a cyclic permutation. 

2.3 Sets of Lines 

Order structure for a set of lines in the plane can be defined in essentially the 
same way as for points by using the fact that they are projectively dual. Some 
care must be exercised however since order structure relies on oriented projec- 
tive geometry where duality does not hold strictly. If p are the homogeneous 
coordinates for a point, and q the homogeneous line coordinates of a line passing 
through p, we have 

T 

q p = 



0 



( 4 ) 




Fig. 5. Order types and canonical orderings for 3, 4, and 5 points. “ ” means 
that ordering is ambiguous up to cyclic permutations 



In order to define order structure for lines in the same way as for points we have 
to normalise the homogeneous line coordinates q in some way. This will be done 
by choosing: 




which means that we consider lines of the form: 

ax + y + b = 0 



(5) 

(6) 



The vertical line x + b = 0 therefore plays the same role as the point at infinity 
in the point case. All lines therefore become oriented relative to the vertical. The 
order type for a set of lines can now be represented by the signs: 

= sign[qiqjqk] {-1,1 (7) 



for all triplets of lines i,j, k in the set. 
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2.4 Combination of Points and Lines 

By using oriented homogeneous coordinates for lines we introduce a direction 
for every line. We can therefore assign a left-right position for every point in the 
plane relative to every line. For line qi and point pj this is given by: 

Xpi = sign{qfpj) {-1,1 (8) 

For every arrangement of points and lines we can therefore talk about the order 
structure of combinations of points and lines represented by these signs. 



2.5 Order Structure Index from Points and Tangent Lines 

An arbitrary set of points and tangent lines can now be assigned a unique order 
structure based on point, line and point- line order structure given by the sign 
sets Xp(i, j, fc), Xi(i, j, fc) for all triplets of points in the set, and 

for all pairs. If we start by computing the order type for the set of points, we 
get a unique numbering from the canonical ordering. In the case where ordering 
is ambiguous up to cyclic permutations we choose the leftmost point as the first 
point. Given the numbering of the points, we can compute all determinant signs 




for triplets of oriented line coordinates and inner product signs for lines and 
points. These signs can be combined into indexes which characterize the order 
structure of the point-line arrangement. Note that points and lines plays different 
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roles in defining these indexes. The order structure of the points is used for the 

numbering of the points and lines. By giving an identity to each point and line, 
we increase the combinatorial richness of the arrangement of lines and thereby 
its discriminatory power compared to the case when the lines are not numbered. 

3 Combinatorial Geometric Hashing - Computing 
Correspondence 

Given subsets of points and lines from two shapes, we can compare their or- 
der structure. Order structure equivalence of subsets of points and tangents of 
an arbitrary shape can be used to establish point correspondence between the 
shapes. For two grey value images, we only have to sample points and tangents, 
i.e. there is no specific feature extraction or perceptual grouping stage neces- 
sary. The qualitative nature of the order structure concept means that metric 
accuracy of sampling point positions and tangent line directions is not required, 
leading to a robust scheme for computing correspondence. 

An algorithm for computing point correspondence between arbitrary shapes 
in grey level images has been implemented using geometric hashing(Lamdan et. 
al. 1988) based on the combinatorial order structure of subsets of points and 
tangent lines. The algorithm computes point correspondence of a certain shape 
and that of a “model-shape” stored in the form of tables that can be indexed. 
The first part of the algorithm is the construction of the model-shape index 
table, and the second part is the actual indexing. The steps of the algorithm 
are illustrated by fig. 7. Both the modeling and indexing stages are based on 
sampling edge points of a shape and selecting all five point combinations. For a 
certain five point set, the canonical order type and an order structure index is 
computed based on the point coordinates and the line coordinates of the tangent 
lines of the edge points. The order structure index is used to identify the specific 
point set in the model table. All five point sets with the same order structure 
index are stored in the same entry given by the index. 

Given a model table we can compute correspondence between the points in 
the table and the points that have been sampled from another shape. From 
the shape to be matched we sample five point combinations and compute order 
structure indexes just as in the modeling stage. The order structure index for a 
certain five point combination is used as a key to the model table and we note 
all the model five point combinations that are stored in the model table with 
that index. For each stored five point combination we vote for a correspondence 
to the five points from the shape to be matched. The end result is therefore a 
table of correspondence votes between the points of model shape and those of 
the shape to be matched. 

4 Results and Discussion 



The algorithm for computing correspondence has been tested on a set of shapes 
that were generated manually. Fig. 8 shows two shapes (A) and (B) of 12 points 
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and tangent lines 
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in model table 

\ 

Accumulate correspondence 
votes between points 



Fig. 7. Combinatorial geometric hashing for computing correspondence 



and their tangent lines. (The points are located at the midpoints of the tangent 
line segments) . The first table shows the normalised accumulated correspondence 
votes for the points when the shape (A) is matched to itself. We see that we get 
a dominant diagonal in the table as could be expected but we also get votes 
for incorrect correspondences but these are almost invariably associated with 
matchings to nearest neighboring points. The second table shows the result of 
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A 

123456789 10 11 12 

1 66 14 8- -- -- -- -- 

2 14 46 28 --------- 

38 28 56 --------- 



4- --6B 32 9- -- -- - 

5- -- 32 39 26 9----- 

A6--- 9264032----- 

7- --- 9 32 65----- 

8- ----- - 67 29- 9 

9- ----- - 2 39 429 

10 ------ - 94 41 - 8 

11------- - 2-27 1 

12 ------ - 9981 40 



B 

123456789 10 11 12 

1 19 52------- - 11 

2 48 22 14 - -- -- -- 39 

3 17 53 55--------- 

4-- - 78 33 7------ 

5 - - - 35 51 35 12 - - - - - 

A 6--- 9 33 46 31----- 

7--- - 5 34 80 ----- 

8 - - - - - - - 62 12 19 2 19 

9---- - 1--13- 2- 

10 ------ - 66 42 16 

11- ------ - 4 - 27 8 

12- ----- - 7- 7- 6 



Fig. 8. Normalised accumulated correspondence votes for a shape (A) matched 
to itself and a deformed version (B) 
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correspondence when the shape (A) is matched to a deformed version (B). The 
peaks in the correspondence table are all at perceptually “correct” correspon- 
dences with the exception of point 12. The tables in this figure as well as the 
following ones are normalised for readability. 

Fig. 9 shows a matching between pictures of a mug with a handle and a cup 
with a handle. From the table we see that most points of the mug are matched 
to perceptually corresponding points of the cup if peak in the table is used as 
matching criterion. The table has been partitioned into shape parts and there 
is a clear “parts” correspondence that can be read out from the table. A major 
ambiguity is that of the inner and outer parts of the mug’s handle that are 
confused with the corresponding inner and outer parts of the cup’s handle and 
that with the cup itself. 

Another example of matching instances of the same object category is shown 
in fig. 10 where the outlines of the pictures of a goose and a crow are matched 
to each other. The overall correspondence is perceptually plausible with the 
exception of the top of gooses back matching to that of the head of the crow. 

The results show that order structure of sampled points and tangent lines of 
a shape can be used to establish point correspondence between between shapes 
that are projections of instances of the same object category. The basic reason 
for this is that order structure is well preserved over shape deformations that 
do not alter the perceptual category of the shape. It is therefore an interesting 
alternative for use in recognition of shape based categories. Order structure can 
be considered as relational structure of very low level features. In that respect 
it is similar to shape category theories based on parts decomposition (Marr and 
Nishihara 1978, Biederman 1985) where object categories are defined based on 
relational structure of generic subparts. The fact that we use low level features 
however, means that we can bypass the stage of perceptual grouping that is 
necessary to define subparts which has proven to be very difficult to achieve in 
a robust way in computer vision. The correspondences that can be established 
between instances of the same object category using order structure indexing 
could be used to define a similarity measure which could be used as a basis for 
a system for categorical recognition (Rosch 1988, Edelman 1998) . 

The fact that we can bypass perceptual grouping is a major advantage with 
our approach since this is a major bottleneck in any recognition system. Since 
order structure is invariant w.r.t. small perturbations of feature locations we also 
gain in robustness since no exact feature localisation is necessary. The extrac- 
tion of features, i.e. points and tangent directions is essentially just a sampling 
process. 

The advantages of using indexing based on combinations of features in this 
way of course comes at the price of combinatorial complexity. By considering all 
combinations of five point features, the complexity of the algorithm will grow 
as where n is the number of features in the image. In practise this means 
that we cannot consider more than 25 — 30 features at a time. A major issue is 
therefore to find means how to reduce this complexity by being selective in the 
choice of feature combinations and only select a subset of all possibilities. 
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CUP 



1234567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

1 41 41 ---16 14 ------ ---- ---- 

2 40 43-- 11 -18 ------ ---- ---- 

3 14 14 19 20--- ------ ---- ---- 

4 14 15 -20 17 -- ------ ---- ---- 

5 14 14 - 13 22 - - - -- -- - - -- - - -- - 

6---- - 30 24 - -- -- - - - 17 - - - 16- 

7 14 16 ---27 39 ------ --13- --12- 

8------- 96 53 21--- ---- ---- 

9 ------ - 69 55 38 16 - - - - - - - - - - 

10------- 49 51 45 28-- ---- ---- 

M 11 ------- 33 46 50 38-- ---- ---- 

U 12 ------- 19 38 48 47 13- ____ ____ 

G 13 ------- -19 32 40 16- ____ ____ 

14 _______ ___i2- - ____ ____ 

15 _______ ______ ____ ____ 

16 _______ - -- --12 ____ ____ 

17 _______ _____i2 ____ ____ 

18- ------ -12 19 26 13- ____ ____ 

19- ----- - - --20 19- -21- - -21-- 

20 ------ - - -- 13 20 14 - 26 - - - 25 -- 

21 - -- --16 14 ______ __43_ __40- 

22- ----- - ______ ___i5 ___i5 

23- ------ -- 12 19 15- ____ ____ 

24- ------ ---14 26 12 -24-- -24-- 

25 ----- 15 - ------ - - 38 - --43- 

26 ----- 17 18 ------ - - 34 - - - 36 - 

27------ - - -- -- - - -- - - -- - 



Fig. 9. Normalised accumulated correspondence votes for MUG and CUP 




0 



t n Isson 



I' 3' 



1*5' 



^ 5 ' 



/24' >2' 









: 52 ' 

> 21 ' 

> 20 ' 

XL9' 

>18' 



/4' 



>17' 



>€' 



>16' 



/ > 12 ' 
>13' 



> 10 ' 



CROW 

1 3 4 5 6 7 8 9 13 16 17 18 19 20 21 22 23 24 25 26 27 28 

118--------------------- 

2 19 - -- -- -- -- -- -- -- -- -- -- 

4 51 - -- -- -- -- -- -- -- -- -- -- 



5- 41-51------------------ 

6- 2049732322---------------- 

7- 19 15 70 44 42-- ------------- - 

8- --18434048-15------------- 

G 9---22434039-17------------- 

010- ---2222473738------------- 

011- -----413636------------- 

S12------446368------------- 

E 13 - -- -- -- 16 33 ------------- 

14 --------- 45 20 ----13 ------ 

15 - - - - - - - - - - - 32 25 31 31 - 23 21 14 - - 

16 - - - - - - - - - - 12 26 23 20 21 - 17 - - - - 

17 - - - - - - - - - - - 11 13 12 - 18 12 - 11 30 15 13 

18 --------------- 24 ---26 39 19 

19 ---------------- 19 11---- 

21 - -- -- -- -- -- -- -- -- 21 ---- 

22 - - - - - - - - - - - - - 13 14 15 - - 30 37 21 19 

23 --------------- 15 ---28 37 19 



Fig. 10. Normalised accumulated correspondence votes for GOOSE and CROW 




O t u tu o spoil n n p s tois 1 

In a sense, this is exactly what classical perceptual grouping tries to achieve 
by using rules of non-accidentalness of image features, (Lowe 1985). Order struc- 
ture indexing implies a more general approach to this problem since we do not 
limit ourselves a priori to specific feature relations but can use arbitrary combi- 
nations on the basis of their effectiveness in establishing correspondence. 
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and Their Applications in Computer Vision* 
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Abstract, n thi p p w how th t th xi t quasi-invariant pa- 
rameterisations whi h not x tly inv i nt ut pp oxim t ly inv i- 
nt un g oup t n o m tion n o not ui high o iv - 

tiv h n u i-inv i nt p m t i tion i inv tig t in mo 

t il n xploit o fining g n 1 n mi-lo 1 inv i nt om 

on o iv tiv only h n w inv i nt impl m nt n 

u o m t hing u v gm nt un g n 1 n motion n x- 
t ting ymm t y x o o j t with 3 il t 1 ymm t y 

1 Introduction 

ativ motion tw n an o s v an t s n aus s isto tions in im- 
ag s. s isto tions an si y sp i transformation groups 1 
su as u i an a n an p oj tiv t ans o mations. us g om t i inva i- 

ants un t s t ans o mation g oups a v y impo tant o o j t ognition 

an i nti ation 1 1 20 22 2 . 

t oug inva iants on points av n stu i xt nsiv y 1 22 32 in- 
va iants on smoot u v s an u v su a s av not n xp o noug 

23 2 30 . t a itiona inva iants o smoot u v s a i ntia inva i- 

ants 30 . Sin t s inva iants qui ig o ivativ s many m t o s 

av n stu i to u t o o ivativ s o ning inva iants on 
u V s. 

w av noug num o istinguis points on u v s t s points 

p ovi 00 inat am s to no ma is isto t u v s wit out using iva- 
tiv s 32 . op ana u v s t istinguis points a qui o no - 
ma ising a n isto tions an ou points o p oj tiv isto tions. w us 
t i ntia p op ty o u v s w an u t num o istinguis 
points qui o omputing inva iants on u v s 1 20 2 . o xamp i 

w av st ivativ s two istinguis points a noug to omput inva i- 

ants un p oj tiv t ans o mations 2 . ow v to n o spon n so 

istinguis points on t o igina an isto t u v s is non-t ivia p o m. 

o op wit t s p o ms s mi- o a inva iants w a so p opos 23 . 
y s ow t at it is possi to n inva iants s mi- o a y y w i t 
o o ivativ s in inva iants an u om t at o g oup u vatu s 

* h utho knowl g th uppo t o th g nt /K 202 
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to t at o g oup a - ngt wit out using any istinguis points on u v s. 

s w av s n in t s wo ks t inva iant pa am t isation is impo tant to 
gua ant uniqu i nti ation o o spon ing int va s on u v s. 

t oug s mi- o a inva iants u t o o ivativ s qui it 
is known t at t o is sti ig in t g n a a n an p oj tiv as s 

(s ta 1). nt is pap w int o u a quasi-invariant parameterisation an 
s ow ow it na s us to us s on o ivativ s inst a o ou t an 

t . i a o quasi-inva iant pa am t isation is to app oximat t g oup 
inva iant a - ngt y ow o ivativ s. n w pa am t isations a 

t o ss s nsitiv to nois an a app oximat y inva iant un a s ig t y 
st i t ang o imag isto tions. 

on pt o quasi-inva iants was o igina y p opos y in o 2 w o 

s ow t at quasi-inva iants na a u tion in t num o o spon ing 

points qui o omputing a g ai inva iants. o xamp quasi-inva iants 
qui on y ou points o omputing p ana p oj tiv inva iants 2 w i 

xa t p ana p oj tiv inva iants qui v points 1 . t as a so ns own 

t at quasi-inva iants xist v n un t situation w t xa t inva iant 
o s not xist 2 . n spit o its pot ntia t quasi-inva iant as not p vious y 

n stu i in tai . n ason o t is is t at t on pt o quasiness is 

at am iguous an is i u t to o ma is . u t mo t xisting m t o 

is imit to t quasi-inva iants as on point o spon n s 2 o t 
quasi-inva iants un sp i mo s 31 . 

n t is pap w inv stigat quasi-inva ian on smoot mani o s an s ow 
t at t xists a quasi- invariant parameterisation t at is a pa am t isation 
app oximat y inva iant un g oup t ans o mations. t oug t app oxi- 
mat va u s a no ong xa t inva iants t i ang s a n g igi o a 
st i t ang o t ans o mations. n t aim is to n in pa am - 
t isations t sttao twnt o aus y t app oximation an 
t o aus y imag nois . 

o owing t motivation w inv stigat a m asu o inva ian w i 

s i s t i n om t xa t inva iant un g oup t ans o mations. o 
o ma is a m asu o inva ian in i ntia o mu a w int o u t so 
a prolongation 1 o v to s. n xt n a quasi-inva iant pa- 
am t as a un tion w i minimis s t i n om t xa t inva iant. 

quasi inva iant pa am t un gnaant ans o mations is t n p o- 

pos . p opos pa am t is app i to s mi- o a int g a inva iants an 

xp oit su ss u y o mat ing uvsun gnaant ans o mations 
in a imag s qu n s. 

2 Semi-local Invariants 

t inva iants a too o a su as i ntia inva iants 30 t y su 
om nois . t inva iants a too g o a su as mom nt (int g a ) inva i- 
ants 111 2 t y su om o usion an t qui m nt o o spon- 

n s. t as ns own nt y 23 t at it is possi to n int g a 




un to n o to ipoll 




Fig. 1. nti ying int va o int g ation s mi o a y. (a) an ( ) a imag s 
o a apan s a a t xt a t om t st an t s on vi wpoints. 

int va o int g ation in t s two imag s an i nti uniqu y om 

inva iant a - ngt . o xamp an int va(— i,+ i) o spon s 

to an int va ( — i, + i)- 



inva iants s mi- oaysotatt y o not su om o usion imag nois 
an t qui m nt o o spon n s. 

onsi a u V C to pa am t is y . t is a so possi to 

pa am t is t u v y inva iant pa am t s un sp i t ans o mation 

g oups. s a a a - ngt o t g oup. impo tant p op ty o 
g oup a - ngt is t at it na s us to i nti y t o spon ing int va o 
u V s automati a y. onsi a point C()onauvCto t ans o m to 
a point C() on a uv C yag oup t ans o mation as s own in ig. 1. Sin 

is an inva iant pa am t isation i w tak an int va ( — , ) on 

C an an int va ( — , + ) on C t n t s two int va s o spon 

to a ot (s ig. 1). at is y int g ating wit sp t to t g oup 

a - ngt t o spon ing int va o int g ation o t o igina an t 
t ans o m u v s an uniqu y i nti 

y using t inva iant pa am t isations w an ns mi- o a inva iants 

at point C( ) wit int va (— , ) as o ows 

pt-\-Aw 

() / ( 1 ) 

J t — Aw 



w is any inva iant un tion un t g oup. oi o p ovi s 

va ious kin s o s mi- o a inva iants 23 . w oos t un tion a u y 
t int g a o mu a (1) an so v ana yti a y an t su ting inva iants 
av simp o ms. o xamp in t an as w av t o owing s mi- 
o a inva iants 
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Fig. 2. S mi- o a inva iants. nan as s mi- o a inva iants a n as 

t atio o two a as n y t inva iant pa am t . 

( ) ^ ( 2 ) 

2l ) 

1 an 2 a t a as ma o t two s ts o two v to s C( -|- i) — C( ) 

an C( — i) — C( ) an C( -1- 2) — C( ) an C( — 2) — C( ) as o ows 

lO ^C(+ i)-c(),c(- i)-c() 

2O ic(+ 2)-C(),C(- 2)-C() 

w xi , X2 not s t t minant o a mat ix w i onsists o two o umn 

V to S Xi, X 2 R^. 

om ta 1 19 it is a t at t s mi- o a inva iants a us u un 
u i an an sp ia a n as s ut t y sti qui ig o ivativ s 

in g n a a n an p oj tiv as s. 

isto tion aus y a g oup t ans o mation is o t n not so a g . o 
xamp t isto tion aus y t ativ motion tw n t o s v an 
t s n is st i t aus o t nit sp o t am a o o j t mo- 
tions. n su as s pa am t s app oximat y ow o ivativ s giv us 

Table 1. o ivativ s qui o t g oup a - ngt an u vatu . 

n g n a ivativ s mo t an t s on o as nsitiv to nois an a 

not avai a om imag s. us t g n a a n an p oj tiv a - ngt as 

w as u vatu s a not p a ti a . 
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a goo app oximation o t xa t inva iant pa am t isation. a su a pa- 
am t isation a quasi-invariant parameterisation. n t o owing s tions w 
n t quasi-inva iant pa am t isation an iv an a n quasi-inva iant 
pa am t isation. 



3 Infinitesimal Quasi-Invariance 

o iving quasi-inva iant pa am t isations w st onsi t on- 
pt o infinitesimal quasi-invariance t at is quasi-inva ian un in nit sima 

g oup t ans o mations. 



3.1 Vector Fields of the Group 

L t a Lie group t at is a g oup wi aist stutu oa smoot 

mani o in su a way t at ot t g oup op ation (mu tip i ation) an t 
inv sion a smoot maps 1 . ans o mation g oups su as otation u- 
i an a n an p oj tiv g oups a Li g oups. 

onsi an imag point x , to t ans o m to x ~ y a 

g oup t ans o mation so t at a un tion ( , ) wit sp t to an 

00 inat s is t ans o m to y . 

n nit sima y t is is onsi as an a tion o a 2 v to v 



V 



'rr 



( 3 ) 



w Fan r/ a un tions oFan FLoayt o itot point x aus 
y t t ans o mation F is si y an int g a u v F o t v to 
V passing t oug t point (s ig. 3). uniqu n ss o an o ina y 

i ntia quation gua ant s t xist n o su a uniqu int g a u v in 
t V to 

aus o its in a ity in nit sima g n ato an si y t sum- 
mation o a nit num o in p n nt v to s Vi {i 1,2, ,F) o 

t g oup as o ows 

m 

V Ev. ( ) 

2=1 



Vi is t tt in p n nt V to 



F F 
^ FF 



( ) 



w Li an rji a asis o i nts o ^ an ^ sp tiv y an a un tions 
oFanF sinpnntvto soma nit im nsiona v to 
spa a a Lie algebra 1 . Lo a y any t ans o mation o t g oup an 
si y an int g a o a nit num o in p n nt v to s Vi. 

V to s i in (3) a ts as a i ntia op ato o t Li ivativ . 
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Fig. 3. V to V an an int g a 

to C y a g oup t ans o mation so t at t 
to P. Lo a y t o it o t point aus 
wit t int g a u V P o t v to 



u V r. u V C is t ans o m 
point P on t u v is t ans o m 
y a g oup t ans o mation oin i s 
at t point P. 



3.2 Exact Invariance 

L t V an in nit sima g n ato o t g oup t ans o mation. a -va u 
un tion Pis inva iant un g oup t ans o mations i an on y i t Li iva- 

tiv o r wit sp t to any in nit sima g n ato v o t g oup F vanis s 
as o ows 1 
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not s t 


Li 


ivativ s wit 


sp t to a V to 


V. Sin F 


is a s a a 


un tion t 


Li 


ivativ is t 
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ivativ 


wit sp 


t to V. 


us t 


on ition o 


inva ian ( ) an 


w itt n as 


0 ows 


















V r 0 
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W V 


is t i 


tiona 


ivativ wit 


sp t to V. 





3.3 Infinitesimal Qnasi-Invariance 

i a o quasi-inva ian is to app oximat t xa t inva iant y a tain 

un tion F{r, I) w i is not xa t y inva iant ut n a y inva iant. t un - 

tion r is not xa t y inva iant t quation ( ) no ong os. an ow v 
m asu t i n om t xa t inva iant y using ( ). y nition t 
ang in un tion F aus y t in nit sima g oup t ans o mation in u 
y a V to V is si y t Li ivativ o P as o ows 
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(10) 



an t asimi op ato Fa n y t m t i t nso is in p n nt o t 
oi o t asis v to s 



w F^ is t inv s o Fij. at is t m t i F^ ang s a o ing to t 
oi o asis v to s v* so t at Fa is an inva iant. Sin F^ is symm t i t 
xists a oi o asis v to s Vi{i 1,2, ,F) y w i F^ is iagona is 

as o ows 




F 

F 



Su V to s Vi{i 1,2, ,F) a uniqu in t g oup F an t us 

int insi . y using t int insi v to s in ( ) w an m asu t ang 

in va u o a un tion F w i is int insi to t g oup F. 

o m asu ing t quasi inva ian o a un tion i sp tiv o t magnitu 
o t un tion w onsi t ang in un tion FF no ma is y t o igina 
un tion F. t us n a m asu o in nit sima quasi inva ian Q o a 
un tion F y t squa sum o no ma is ang s in un tion aus y t 

int insi v to s Vi(i 1,2, ,F) as o ows 




is is a m asu o ow inva iant t un tion is un t g oup t ans- 

o mation. Q is sma noug w a Fa quasi-inva iant un in nit sima 

g oup t ans o mations. 

n o tunat y i t g oup is not s mi-simp ( .g. g n a a n g oup g n- 
a in a g oup) t Ki ing o m is g n at an w o not av su int in- 
si v to s. ow V it is known t at a non-s mi-simp g oup is om- 

pos into a s mi-simp g oup an a a i a . us in su as s w oos a 

stovto swi o spon to t s mi-simp g oup an t a i a . 
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4 Quasi-Invariance on Smooth Manifolds 

n t ast s tion w int o u t on pt o in nit sima quasi-inva ian 

w i is t quasi-inva ian un in nit sima g oup t ans o mations an 

iv am asu o t inva ian o an app oximat un tion. n o tunat y 

(11) is va i on y o un tions w i o not in u ivativ s. n t is s tion 
w int o u an impo tant on pt known as t prolongation 1 o v to 

s an inv stigat quasi-inva ian on smoot mani o s so t at it na s 
us to n quasi-inva iants wit a i ntia o mu a. 

4.1 Prolongation of Vector Fields 

p o ongation is a m t o o inv stigating t i ntia wo om a g o- 
m t i point o vi w. L t a smoot u v C si y an in p n nt 

va ia F an a p n nt va ia F wit a smoot un tion F as o ows 

F F{F) 

u V C is t ans o m to C y a g oup t ans o mation F in u y a 
V to V as s own in ig. . onsi a It o p o ong spa w os 

00 inat s a F F an ivativ s o F wit sp t to F up to /t o so 

t at t p o ong spa is F -|- 2 im nsiona . u v s C an C in 2 

spa a p o ong an s i y spa u v s an in t F+2 

im nsiona p o ong spa . p o ong v to is a v to in 

F+2 im nsion w i a i st p o ong u v tot p o ong u v 

g(fe) 

xp i it y as s own in ig. . o p is y t Ft o p o ongation 

o a v to V is n so t at it t ans o ms t Ft o ivativ s 

o a un tion F F{F) into t o spon ing Ft o ivativ s 

o t t ans o m un tion F F{F) g om t i a y. 

L t Vi, {i 1,H^F) r in p n nt V to s in u y a g oup 
t ans o mation F. Sin t p o ongation is in a t R p o ongation 

(k) 

oagnavto van si ya sum o /t p o ongations v) 

o t in p n nt V to s as o ows 

m 

v<‘> ^v<‘> 

i=l 

onsi a V to ( ) in 2 spa again, ts st an s on p o ongations 
v(i) a omput as o ows 1 

Vi + {F^{r]i - FFa;) + FiFa;x)j^ (12) 

J-X 

V' ^ '^i F{F^{rji — FiFx) + FiFxxx)Y+ (1^) 

4 XX 

w F X an F'^ not t st an t s on tota ivativ s wit sp t 

to F an Fx Fxx Fxxx not t st s on an t t i ivativ s o F 




0 



un to n o to ipoll 



original image transformed image 




X 



prolonged space prolonged space 

Fig. 4. o ongation oavto . Ho po ong v to 
t ans o ms H o ivativ s o F into It o ivativ s o F. at is 

t p o ong u V is t ans o m into t p o ong u v y t 

p o ong V to . is na s us to inv stigat ivativ so un tions 

g om t i a y. not s H o p o ongation. is gu i ust at s t 

st o p o ongation [F 1). 



wit sp t to n L t F(r, rj 1^^^) a un tion o F Fan ivativ s o F wit 
sp t to F up to rt o w i is not y . Sin t p o ongation 
si s ow t ivativ s a going to ang un g oup t ans o mations 

w an omput t ang in un tion FF aus y t g oup t ans o mation 
F as o ows 

FT F 

w is t Ft o p o ongation o t in nit sima g n ato v o 

a t ans o mation F Not t at w qui on y t sam o o p o ongation 
as t at o t un tion F. Sin t p o ongation s i s ow ivativ s 
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a going to ang it is impo tant o va uating t quasi-inva ian o a 
i ntia o mu a as si in t n xt s tion. 



4.2 Quasi-Invariance on Smooth Manifolds 

L t us onsi t u V C in 2 spa again. Suppos is a un tion 

on t u V ontaining t ivativ s o F wit sp t to P up to t Ft 

o w i w not y . Sin t It o p o ongation o t 

V to V t ans o ms It o ivativ s I^”^ o t o igina u v to 

It o ivativ s 2^"^ o t t ans o m u v t ang in un tion 

nj(i^")) aus y t in nit sima g oup t ans o mation in u y t it 
in p n nt V to is si y 






(1 ) 



quasi-inva iant is a un tion w os va iation aus 
is ativ y sma ompa wit its o igina va u . 
inva ian Q on smoot u v C y t no ma is 
int g at a ong t u v C as o ows 



y g oup t ans o mations 
t us n a m asu o 

squa sum o H7(it")) 



Q 




(1 ) 



1(1^")) is os tot 
su o ow inva iant t 



xa t inva iant t n Q t n s to z o. us Q is a m a- 

un tion 1(1^"^) is un t g oup t ans o mation. 



5 Quasi-Invariant Parameterisation 



n t ast s tion w av iv quasi-inva ian on smoot mani o s. 
now app y t su ts an inv stigat t quasi-inva ian o pa am t isation 
un g oup t ans o mations. 







g oup 


a - ngt 


F 


0 a u V C is in g 


n a si y a g oup 


m 


t i 


F an 


t in 


p n 


nt va ia Tot 


u V as 0 ows 












IT ITT 




w 




FF an IT a 


t 


i ntia s o T an 


T sp tiv y. Suppos t 


m 


t i 


r is 


s i 


y t 


ivativ s 0 T wit 


sp t to T up to It 0 


as 


0 


ows 






T 




w 




jik) 


not s t 


It 0 


p 0 ongation o T 


ang 0 t i ntia 


fff. 


aus 


y t it 


in p n nt V to 


is t us iv y omputing 


t 


Li 


ivativ 0 IT wit 


sp t to t It 0 


p 0 ongation o 



FFFi vf ^ FFF (vf ^ F + I^)FF (1 ) 




2 



un to n o to ipoll 



ang in IT no ma is y IT its is si as o ows 

ITT, 



TTTi 



TT 



r + — 

T ^ ^ yT 



(1 ) 



m asu o inva ian o t pa am t T is t us si y int g ating 

t squa sum o ll Ti a ong t u v C as o ows 



Q / CTT 

Jc 



(1 ) 



^ E(^vf ) r + (19) 

i=l 

t pa am t is os to t xa t inva iant pa am t Q t n s to z o. 

t oug t is no xa t inva iant pa am t un ss it as noug o so 
ivativ s t sti xists a pa am t w i minimis s Q an qui s on y 
ow o ivativ s. a su a pa am t a quasi-inva iant pa am t 

o t g oup. n ssa y on ition o Q to av a minimum is t at its st 

va iation IQ vanis s 



TQa 0 ( 20 ) 

is is a va iationa p o m o on in p n nt va ia T an two p n nt 

va ia s an T. t is known t at (20) o s i an on y i its u -Lag ang 

vanis s as o ows 1 

£ C 0 ( 21 ) 

w £ not s t u op ato . 

n t n xt s tion w onsi an as an iv a m t i w i 

minimis sQun gnaant ans o mations. 

6 AfRne Quasi-Invariant Parameterisation 

n t is s tion w app y quasi-inva ian to iv a quasi-inva iant pa am - 

t isation un gnaant ans o mations w i qui s on y s on o 
ivativ s an is t us ss s nsitiv to nois t an t xa t inva iant pa am t 
w i qui s ou t o ivativ s. 

Suppos t quasi-inva iant pa am t isation Pun gnaant ans- 
o mation is o s on o so t at t m t i Tot pa am t P is ma 
o ivativ s up to t s on 

TT I{T,T^)TT (22) 

w Tx an Txx at st an t s on ivativ so T wit sp t to P 
o n a quasi-inva iant pa am t is t us t sam as n ing a s on o 
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(c1) prolonged defi vectorfield (c2) top view (d1) prolonged def2 vector field (d2)topview 



Fig. 5. oongationo a n v to s. (a) ( ) ( ) an ( ) s ow t o igina 
an t p o ong a n v to s (i. . iv g n u an two o mation 
ompon nts). 

i ntia un tion w i minimis s t quasi-inva ian Q un 

g n a a n t ans o mations. Sin t mti F is os on o w qui 
t s on o p o ongation o t v to s to omput t quasi-inva ian 
o t mti. 

6.1 Prolongation of Affine Vector Fields 

two im nsiona g n a a n t ans o mation is si y a 2 x 2 inv ti 
mat ix A L(2) an at ans ationa ompon nt t an t ans o ms 

X into X R^ as o ows 

X Ax + 1 



Sin t i ntia o m FF in (22) o s not in u F an F ompon nts it 
is inva iant un t ans ations. us w simp y onsi t a tion o A 



L(2) w i an 


s i 


y ou in 


p n nt V to s Vi{i 


1, , ) 


t at is t iv g n 


u 


an t 


two 


ompon 


nts 0 0 mation 
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^F 






^F 




Vl 


r— 


+ r— 




V2 


-F— + F— 




rr 


FF 




FF FF 




Vs 




-F-^ 




V4 


F F 

r-T^ + r— 


(23) 


rr 


FF 




FF FF 


Sin t g n a 


in a 


g oup 


L(2) 


is not 


s mi-simp t Ki 


ing 0 m 


(9) is g n at an 


t 


is no 


uniqu 


oi 


0 V to sot 


g oup 


(s s tion 3.3). t is 


ow 


V 


ompos 


into t a i a w i o 


spon s 



to t iv g n an t sp ia in a g oup SL(2) w i is s mi-simp an 




un to n o to ipoll 



w os int insi v to s oin i wit V2 V3 an V4 in (23). us w us 
t V to s in (23) o omputing t quasi-inva ian o i ntia o ms 

un g n a a n t ans o mations. 

om (12) (13) an (23) t s on p o ongations o t s v to s a 

omput y 



.(2) 

1 


F 

^xx I 1 

^ ^ XX 






.(2) 

2 


V2 + (1 + 1^) + SFcFxx „ 

^ ^ ^ XX 




.(2) 

3 


V 3 2/j; HFxx „„ 

^ ^X ^ ^ XX 






.(2) 

4 


V 4 + {l-I^)-^;p iFxFxx 


F 

PFxx 


(2 ) 



sat V to s in on im nsion w os 00 inat s a F F an 

Fxx an t p oj tion o t s v to s onto t F— T p an oin i s wit 
t o igina a n v to s (23) in two im nsion. o a s n t 
st p o ongations oanvto sas own in ig. . no tunat y t 

s on p o ongations annot s own in gu s sin t y a ou im nsiona 

V to s. ow V t st p o ongations at p oj tion os on p o- 
ongations an t us ig. may p a s to in t st u tu o s on 

p o ongations. 

Sin FFF is o s on o t p o ong v to s an 

si ow t pa am t T is going to ang un g n a a n 

t ans o mations. 

6.2 Affine Quasi-Invariant Parameterisation 

y su stitutmg t a n p o ong v to s v) V2 ' an V4 into 
(19) an so ving (21) w n t at IQ vanis s o any u v F F{F) i t 

o owing un tion P is os n 2 

F FxxH^ + lty^ (2) 

on u t at o any u v t o owing pa am t F is quasi-inva iant 
un g n a a n t ans o mations 

FF FxxH^ + iiy^rr (2) 

y o ma ising (2 ) w n t at t pa am t P is s i y t u- 

i an a - ngt FF an t u i an u vatu F as o ows 

FF FilF (2 ) 

us FF is in at an xa t inva iant un otation an quasi-inva iant un 
iv g n an o mation. Not it is known t at t inva iant pa am t un- 
simi a ity t ans o mations is FFF an t at o sp ia a n t ans o mations is 
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(a) Integral invariants (std 0.1) (b) Differential invariants (std 0.1) 





(c) Integral invariants (std 0.5) (d) Differential invariants (std 0.5) 



Fig. 6. su ts o nois s nsitivity ana ysis. inva iant signatu s o an a ti- 
ia u V a iv om t p opos inva iants (s mi- o a inva iants as 
on a n quasi inva iant pa am t isation) an t an i ntia inva iants 
(a n u vatu ) an a s own y t i k in s in (a) an ( ) sp tiv y. 

ots in (a) an ( ) s ow signatu sat a ing an om aussian nois o st 
0.1 pix s an t ots in ( ) an ( ) s ow signatu sat a ing an om 
aussian nois o st 0. pix s. t in in s s ow t un tainty oun s o 
t signatu s stimat y t in a p tu ation m t o . signatu s om 

t p opos m t o a mu mo sta t an t os o i ntia inva iants. 



IT iv quasi-inva iant pa am t Fognaant ans o mations 

is tw n t s two as xp t . a Ft affine quasi-invariant parameter 
(arc-length). Sin t n w pa am t qui sonyt s on o ivativ s 

it is xp t to ss s nsitiv to nois t an t xa t inva iant pa am t 

un g n a a n t ans o mations. 

y using F o F in (2) w an nan quasi s mi- o a inva iants. 

s inva iants qui on y s on o ivativ s an o not qui any 

istinguis points on u v s. 



7 Experiments 

7.1 Noise Sensitivity of Quasi Invariants 

st ompa t nois s nsitivity o t s mi- o a inva iants as on t 
p opos a n quasi-inva iant pa am t isation wit t at o a n i ntia 
inva iants i. . a n u vatu . 





un to n o to ipoll 



inva iant signatu s o an a ti ia u v av n omput om t 
p opos quasi-inva iants an t a n u vatu an a s own y so i in s 

in ig. (a) an ( ). ots in (a) an ( ) s ow t inva iant signatu sat 

a ing an om aussian nois o stan a viation o 0.1 pix s to t position 
ata o t u V an t ots in ( ) an ( ) s ow t os o stan a viation o 

0. pix s. s w an s in t s signatu s t p opos inva iants a mu 

ss s nsitiv to nois t an t i ntia inva iants. is is simp y aus 

t p opos inva iants qui on y s on o ivativ s w i i ntia 

inva iants qui ou t o ivativ s. t in in s s ow t su ts o 

nois s nsitivity ana ysis iv y t in a p tu ation m t o . 

7.2 Curve Matching Experiments 

N xt w s ow p imina y su ts o u v mat ing xp im nts un ativ 

motion tw n an o s v an o j ts. 

ig. (a) an ( ) s ow t imag s o natu a av s tak n om two i 

nt vi wpoints. w it in s in t s imag s s ow xamp ontou u v s 

xt a t om -sp in tting . sw ans int s uvs aus o t 

vi w motion t u v s a isto t an o u pa tia y. Sin t a is 

nay flat an t xt nt o t a is mu ss t an t istan om t 
am a to t aw an assum t at t o spon ing u v s a at y 
a g n a a n t ans o mation. 

omput inva iant signatu sot o igina an t isto t uvs 

a s own in ig. ( ). n o t s two signatu s was sit o izonta y 
minimising t tota i n tw n t s two signatu s. o spon ing 

points on t ontou uvsw xtat y taking i nti a points in t s 
two signatu s an a s own in ig. ( ) an ( ). Not t at t xt a t 

o spon ing u v s a ai y a u at . n t is xp im nt w av os n 

3 0 o omputing inva iant signatu s. 

7.3 Extracting Symmetry Axes 

n xt app y t quasi inva iants o xt a ting t symm t y ax s o t 
im nsiona o j ts. xt a ting symm t y 9 10 2 o o j ts in imag s is v y 
impo tant o ognising o j ts 1 29 o using att ntion 21 an ont o ing 

o ots 3 ia y. t is w known t at t o spon ing ontou u v s o a 

p ana i at a symm t y an si y sp ia a n t ans o mations 12 
2 . n t is s tion w onsi a ass o symm t y w i is si y a 
g n a a n t ans o mation. 

onsi a p ana o j t to av i at a symm t y wit an axis . Sup- 
pos t p ana o j t an s pa at into two p an s at t axis an is 

onn t y a ing so t at two p an s an otat a oun t is axis as s own 

in ig. (a). o j ts iv y otating t s two p an s av a 3 i at- 

a symm t y. is ass o symm t y is a so ommon in a ti ia an natu a 

o j ts su as utt fli s an ot flying ins ts. Sin t isto tion in imag s 
aus y a t im nsiona motion o a p ana o j t an si y a 
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) an ( ). w it 


in s in t s imag s 


s ow 


xt a t ontou 


U V s. 




quasi-inva iant a - 


ngt an s mi- o a 


inva iants a omput 


om t 
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v s in (a) an ( ) 


an a s own in ( ) 


y so i 


an as in s 


. ( ) an 


( ) 


s ow t 0 spon 


ing u v s xt a t 


om t 
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Rotation 



( ) 




Fig. 8. i at a symm t y wit otation. t an t ig t pa ts o an 

o j t wit i at a symm t y a otat wit sp t to t symm t y axis 

in (a). int s tion point i o two tang nt in s an l\ at o spon ing 

points 1 an i o a i at a symm t y wit otation i s on t symm t y 

ax s in ( ). w av oss points i{i 1, , ) t symm t y axis 

an omput y tting a in to t s oss points 12 n- 



g n a a n t ans o mation t is ass o symm t y an a so si y 

g n a a n t ans o mations un t w ak p sp tiv assumption. us t 
o spon ing two u v s o t is symm t y av t sam inva iant signatu s 

un g n a a n t ans o mations. 

n xt s ow t su ts o xt a ting symm t y ax s o 3 i at a sym- 
m t y. ig. 9 (a) s ows an imag o a utt fly {Small White) wit a flow . 

Sin t two wings o t utt fly a not op ana t o spon ing on- 

tou u V s o t two wings a at yagnaant ans o mation as 

si a ov . ig. 9 ( ) s ows xamp ontou u v s xt a t om (a). 

Not t at not a t points on t u v s av o spon n s aus o t 

a k o g ata an t p s n o spu ious g s. so i an as in s 
in ig. 9 ( ) s ow t inva iant signatu s omput om t t an t ig t 
wings in ( ) sp tiv y. ( n t is xamp w os ' 0 o omputing 

s mi- o a inva iants.) Sin t signatu s a inva iant up to a s i t w av 

simp y fl t an s i t on inva iant signatu o izonta y minimising t 

tota in tw n two signatu s 

s s own in t s signatu s s mi- o a inva iants as on quasi-inva iant 

pa am t isation a quit a u at an sta . o spon ing points a iv 
y taking t i nti a points on t s two signatu s an s own in ig. 9 ( ) 
y onn ting t o spon ing points, ang nt in s at v y o spon ing 

pai o points a omput an isp ay in ig. 9 ( ) y w it in s. oss 

points o V y pai o tang nt in s a xt a t an s own in ig. 9 ( ) y 

squa ots. symm t y axis o t utt fly is xt a t y tting a in to 
t oss points o tang nt in s an s own in ig. 9 ( ). t oug t xt a t 
ontou u V s in u asymm t i pa ts as s own in ( ) t omput axis 
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( ) o igin 1 im g 



( ) ontou u V 





( ) t ng nt lin ( ) ymm t y xi 



Fig. 9. xt a tion o axis o i at a symm t y wit otation. (a) s ows t 
o igina imag o a utt fly (Sma it ) p on a flow . ( ) s ows an 

xamp o ontou u v s. inva iant signatu sot s uvsa om- 

put om t quasi-inva iant a - ngt an s mi- o a inva iants. ( ) s ows 

t xt a t inva iant signatu sot tant igtuvsin(). 

a k in s in ( ) onn t pai so o spon ing points xt a t om t in- 
va iant signatu s in ( ). w it in s an t squa ots s ow t tang nt 
in s o t o spon ing points an t i oss points. w it in in ( ) 
s ows t symm t y axis o t utt fly xt a t y tting a in to t oss 

points. 





90 



un to n o to ipoll 



o symm t y ag s wit t o y o t utt fly quit w . as pu y 

g o a m t o s .g. mom nt as m t o s 9 10 wou not wo k in su 
as s. s su ts s ow t pow an us u n ss o t p opos s mi- o a 
inva iants an quasi-inva iant pa am t isation. 

8 Discussion 

n t is pap w av s own t at t xist quasi-inva iant pa am t isations 
w i a not xa t y inva iant ut app oximat y inva iant un g oup t ans- 

o mations an o not qui ig o ivativ s. an quasi-inva iant 

pa am t isation is iv an app i o mat ing o u v s un t w ak 
p sp tiv assumption. 

t oug t ang o t ans o mations is imit t p opos m t o is us - 

u o many as s sp ia y o u v mat ing un ativ motion tw n a 
vi w an o j ts sin t mov m nts o a am a an o j ts a in g n a 
imit . now is uss t p op ti s o t p opos pa am t isation. 

1. Noise Sensitivity 

Sin quasi-inva iant pa am t s na us to u t o o ivativ s 
qui t y a mu ss s nsitiv to nois t an xa t inva iant pa am - 
t s. us using t quasi-inva iant pa am t isation is t sam as n ing 

t st t a o tw n t syst mati o aus y t app oximation 

an t o aus y t nois . iv pa am t s a mo asi 

t an t a itiona inva iant pa am t s. 

2. Limitation of the Amount of Motion 

p opos quasi-inva iant pa am t assum s t g oup motion to 
imit toasma amount, nt an as t is imitation is a out i 0 1 
3 01 an 4 Olot ivgnant o mation ompon nts 

(t is no imitation on t u ompon nt 2 )- Sin in many omput 

vision app i ations t isto tion o t imag is sma u to t imit 

sp o t ativ motion tw n a am a an t snot nit 
istan tw n two am as in a st o syst m w i v t p opos 
pa am t isation an xp oit in many app i ations. 
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Abstract. Due to illumination variability, the same object can appear 
dramatically different even when viewed in fixed pose. Consequently, 
an object recognition system must employ a representation that is ei- 
ther invariant to, or models this variability. This chapter presents an 
appearance-based method for modeling this variability. In particular, we 
prove that the set of n-pixel monochrome images of a convex object with 
a Lambertian reflectance function, illuminated by an arbitrary number 
of point light sources at infinity, forms a convex polyhedral cone in IR" 
and that the dimension of this illumination cone equals the number of 
distinct surface normals. For a non-convex object with a more general 
reflectance function, the set of images is also a convex cone. Geometric 
properties of these cones for monochrome and color cameras are con- 
sidered. Here, present a method for constructing a cone representation 
from a small number of images when the surface is continuous, possibly 
non-convex, and Lambertian; this accounts for both attached and cast 
shadows. For a collection of objects, each object is represented by a cone, 
and recognition is performed through nearest neighbor classification by 
measuring the minimal distance of an image to each cone. We demon- 
strate the utility of this approach to the problem of face recognition (a 
class of non-convex and non-Lambertian objects with similar geometry). 
The method is tested on a database of 660 images of 10 faces, and the 
results exceed those of popular existing methods. 



1 Introduction 

One of the complications that has troubled computer vision recognition al- 
gorithms is the variability of an object’s appearance from one image to the 
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next. With slight changes in lighting conditions and viewpoint often come large 
changes in the object’s appearance. To handle this variability methods usually 
take one of two approaches: either measure some property in the image of the 
object which is, if not invariant, at least insensitive to the variability in the imag- 
ing conditions, or model the object, or part of the object, in order to predict the 
variability. 

Nearly all approaches to object recognition have handled the variability due 
to illumination by using the first approach; they have, for example, concentrated 
on edges, i.e. the discontinuities in the image intensity. Because discontinuities in 
the albedo on the object’s surface or discontinuities in albedo across the object’s 
boundary generate edges in images, these edges tend to be insensitive to a range 
of illumination conditions [5] . 

Yet, edges do not contain all of the information useful for recognition. Fur- 
thermore, objects which are not simple polyhedra or are not composed of piece- 
wise constant albedo patterns often produce inconsistent edge maps. The top of 
Fig. 1 shows two images of a person with the same facial expression and pho- 
tographed from the same viewpoint. The variability in these two images due to 
differences in illumination is dramatic: not only does it lead to a change in con- 
trast, but also to changes in the configuration of the shadows, i.e. certain regions 
are shadowed in the left image, but illuminated in the right, and vice versa. The 
edge maps in the bottom half of Fig. 1 are produced from these images. Due 
to the variation in illumination, only a small fraction of the edges are common 
between images. Figure 9 shows another example of extreme illumination vari- 
ation in images; in this case observe the extreme variability in the images of a 
single individual illuminated by a single light source in different locations. 

The reason most approaches have avoided using the rest of the intensity 
information is because its variability under changing illumination has been diffi- 
cult to tame. Methods have recently been introduced which use low-dimensional 
representations of images to perform recognition, see for example [15, 27, 39]. 
These methods, often termed appearance-based methods, differ from feature- 
based methods in that their low-dimensional representation is, in a least-squared 
sense, faithful to the original image. Systems such as SLAM [27] and Eigenfaces 
[39] have demonstrated the power of appearance-based methods both in ease of 
implementation and in accuracy. Yet these methods suffer from an important 
drawback: recognition of an object (or face) under a particular pose and lighting 
can be performed reliably provided that object has been previously seen under 
similar circumstances. In other words, these methods in their original form have 
no way of extrapolating to novel viewing conditions. Yet, if one enumerates 
all possible poses and permutes these with all possible illumination conditions, 
things get out of hand quite quickly. This raises the question: Is there some un- 
derlying “generative” structure to the set of images of an object under varying 
illumination and pose such that to create the set, the object does not have to be 
viewed under all possible conditions? 

In this chapter we address only part of this question, restricting our investi- 
gation to varying illumination. In particular, if an image with n pixels is treated 
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Original Images 




Edge Maps 



Fig. 1. Effects of Variability in Illumination: The top two images show the 
same face seen under different illumination conditions. The bottom two images 
show edge maps of the top two images. Even though the change in light source 
direction is less than 45°, the change in the resulting image is dramatic. 



as a point in IR", what is the set of all images of an object under varying il- 
lumination? Is this set an incredibly complex, but low-dimensional manifold in 
the image space? Or does the set have a simple, predictable structure? If the 
object is convex in shape and has a Lambertian reflectance function, can a finite 
number of images characterize this set? If so, how many images are needed? 

The image formation process for a particular object can be viewed as a func- 
tion of pose and lighting. Since an object’s pose can be represented by a point in 
IR^ X (a six dimensional manifold), the set of n-pixel images of an object 

under constant illumination, but over all possible poses, is at most six dimen- 
sional. Murase and Nayar take advantage of this structure when constructing 
appearance manifolds [27]. However, the variability due to illumination may be 
much larger as the set of possible lighting conditions is infinite dimensional. 

Arbitrary illumination can be modeled as a scalar function on a four dimen- 
sional manifold of light rays [25]. However, without limiting assumptions about 
the possible light sources, the bidirectional reflectance density functions, or ob- 
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ject geometry, it is difficult to draw limiting conclusions about the set of images. 
For example, the image of a perfect mirror can be anything. Alternatively, if the 
light source is composed of a collection of independent lasers with one per pixel 
(which is admissible under [25]), then an arbitrary image of any object can be 
constructed by appropriately selecting the lasers’ intensities. 

Nonetheless, we will show that the set of images of an object with arbitrary 
reflectance functions seen under arbitrary illumination conditions is a convex 
cone in IR" where n is the number of pixels in each image. Furthermore, if 
the object has a convex shape and a Lambertian reflectance function and is 
illuminated by an arbitrary number of point light sources at infinity, this cone is 
polyhedral and can be determined from as few as three images. In addition, we 
will show that while the dimension of the illumination cone equals the number 
of distinct surface normals, the shape of the cone is “flat,” i.e. the cone lies near 
a low dimensional linear subspace of the image space. When the object is non- 
convex and non-Lambertian, methods for approximating the cone are presented. 
Throughout the chapter, empirical investigations are presented to complement 
the theoretical arguments. In particular, experimental results are provided which 
support the validity of the illumination cone representation and the associated 
propositions on the illumination cone’s dimension and shape. Note that some 
results in this chapter were originally presented in [4, 14]. 

The effectiveness of these algorithms and the cone representation is demon- 
strated within the context of face recognition - it has been observed by Moses, 
Adini and Ullman that the variability in an image due to illumination is often 
greater than that due to a change in the person’s identity [26]. Figure 9 shows the 
variability for a single individual. It has also been observed that methods for face 
recognition based on finding local image features and using their geometric rela- 
tion are generally ineffective [6]. Hence, faces provide an interesting and useful 
class of objects for testing the power of the illumination cone representation. 

In this chapter we empirically compare this new method to a number of 
popular techniques and representations such as correlation [6] and Eigenfaces [24, 
39] as well as more recently developed techniques such as distance to linear 
subspace [3, 15, 29, 34]; the latter technique has been shown to be much less 
sensitive to illumination variation than the former. However, these methods also 
break down as shadowing becomes very significant. As we will see, the presented 
algorithm based on the illumination cone outperforms all of these methods on a 
database of 660 images. It should be noted that our objective in this work is to 
focus solely on the issue of illumination variation whereas other approaches have 
been more concerned with issues related to large image databases, face finding, 
pose, and facial expressions. 

We are hopeful that the proposed illumination representation will prove use- 
ful for 3-D object recognition under more general conditions. For problems where 
pose is unknown, we envision marrying the illumination cone representation with 
a low-dimensional set of image coordinate transformations [40] or with the ap- 
pearance manifold work of [27], thus allowing both illumination and pose varia- 
tion. For problems in which occlusion and non-rigid motion cannot be discounted. 
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we envision breaking the image of an object into sub-regions and building “illu- 
mination sub-cones.” These illumination sub-cones could then be glued together 
in a manner similar to the recent “body plans” work of [12]. 

2 The Illumination Cone 

In this section, we develop the illumination cone representation. To start, we 
make two simplifying assumptions: first, we assume that the surfaces of objects 
have Lambertian reflectance functions; second, we assume that the shape of 
an object’s surface is convex. While the majority of the propositions are based 
upon these two assumptions, we will relax them in Section 2.2 and show that the 
set of images is still a convex cone. In addition, the empirical investigations of 
Section 2.4 will demonstrate the validity of the illumination cone representation 
by presenting results on images of objects which have neither purely Lambertian 
reflectance functions nor convex shapes. The cone representation will then be 
used for face recognition in Section 5. 



2.1 Illumination Cones for Convex Lambertian Surfaces 

To begin, let us assume a Lambertian model for reflectance with a single point 
light source at infinity. Let x denote an image with n pixels. Let B e 
be a matrix where each row of B is the product of the albedo with the inward 
pointing unit normal for a point on the surface projecting to a particular pixel; 
here we effectively approximate a smooth surface by a faceted one and assume 
that the surface normals for the set of points projecting to the same image pixel 
are identical. 

Let s e IR^ be a column vector signifying the product of the light source 
strength with the unit vector for the light source direction. Thus, a convex 
object with surface normals and albedo given by B, seen under illumination s, 
produces an image x given by the following equation 

X = max(i3s, 0), (1) 

where max(-,0) zeros all negative components of the vector Bs [18]. Note that 
the negative components of Bs correspond to the shadowed surface points and 
are sometimes called attached shadows [33]. Also, note that we have assumed that 
the object’s shape is convex at this point to avoid cast shadows, i.e. shadows that 
the object casts on itself. 

If the object is illuminated by k point light sources at infinity, the image x 
is given by the superposition of images which would have been produced by the 
individual light source, i.e. 



k 

X = max(Rsi, 0) 

i=l 
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where is a single light source. Note that extended light sources at infinity can 
be handled by allowing an infinite number of point light sources (i.e., the sum 
becomes an integral). 

The product of B with all possible light source directions and strengths 
sweeps out a subspace in the n-dimensional image space [17, 29, 33]; we call the 
subspace created by B the illumination subspace C, where 

£ = {x I X = Bs,Vs e IR^}. 

Note that the dimension of £ equals the rank of B. Since Z? is an n x 3 matrix, £ 
will in general be a 3-D subspace, and we will assume it to be so in the remainder 
of the chapter. When the surface has fewer than three linearly independent sur- 
face normals, B does not have full rank. For example, in the case of a cylindrical 
object, both the rank of B and dimension of £ are two. Likewise, in the case of 
a planar object, both the rank and dimension are one. 

When a single light source is parallel with the camera’s optical axis, all 
visible points on the surface are illuminated, and consequently, all pixels in the 
image have non-zero values. The set of images created by scaling the light source 
strength and moving the light source away from the direction of the camera’s 
optical axis such that all pixels remain illuminated can be found as the relative 
interior of a set £o defined by the intersection of £ with the non- negative orthant^ 
of M”. 

Lemma 1. The set of images £q is a convex cone in IR”. 

Proof. £q = £ n {x I X G IR", with all components of x > 0}. Both £ and the 
positive orthant are convex. For the definition of convexity and the definition 
of a cone, see [7, 31]. Because the intersection of two convex sets is convex, it 
follows that £o is convex. 

Because £ is a linear subspace, if x G £ then ax G £. And, if x has all 
components non-negative, then ax has all components non-negative for every 
a > 0. Therefore ax G £q. So it follows that £q is a cone. 

As we move the light source direction further from the camera’s optical axis, 
points on the object will fall into shadow. Naturally, which pixels are the image 
of shadowed or illuminated surface points depends on where we move the light 
source direction. If we move the light source all the way around to the back of 
the object so that the camera’s optical axis and the light source are pointing in 
opposite directions, then all pixels are in shadow. 

Let us now consider all possible light source directions, representing each 
direction by a point on the surface of the sphere; we call this sphere the illumi- 
nation sphere. For a convex object, the set of light source directions for which a 
given facet (i.e. pixel in the image) is illuminated corresponds to an open hemi- 
sphere of the illumination sphere; the set of light source directions for which the 

^ By orthant we mean the high-dimensional analogue to quadrant, i.e., the set {x | x G 
IR", with certain components of x > 0 and the remaining components of x < 0}. By 
non-negative orthant we mean the set {x | x G IR", with all components of x > 0}. 
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Fig. 2. The Illumination Sphere: The set of all light source directions can be 
represented by points on the surface of a sphere; we call this sphere the illumina- 
tion sphere. Great circles corresponding to individual pixels divide the illumina- 
tion sphere into cells of different shadowing configurations. The arrows indicate 
the hemisphere of light directions for which the particular pixel is illuminated. 
The cell of light source directions which illuminate all pixels is denoted by iSq. 
The light source directions within <So produce £q the set of images in which all 
pixels are illuminated. Each of the other cells produce the Li, 0 < i < n(n— 1) + 1. 
The extreme rays of the cone are given by the images produced by light sources 
at the intersection of two circles. 



facet is shadowed corresponds to the other hemisphere of points. A great circle 
on the illumination sphere divides these sets. 

For each of the n pixels in the image, there is a corresponding great circle 
on the illumination sphere. The collection of great circles carves up the surface 
of the illumination sphere into a collection of cells Si. See Figure 2. The col- 
lection of light source directions contained within a cell Si on the illumination 
sphere produces a set of images, each with the same pixels in shadow and the 
same pixels illuminated; we say that these images have the same “shadowing 
conhgurations.” Different cells produce different shadowing conhgurations. Xote 
that this partitioning is reminiscent of the partitioning of the viewpoint space in 
the construction of orthographic projection aspect graphs of convex polyhedral 
objects [41]. 

We denote by Sq the cell on the illumination sphere containing the collection 
of light source directions which produce images with all pixels illuminated. Thus, 
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the collection of light source directions from the interior and boundary of Sq 
produces the set of images Lq. To determine the set of images produced by 
another cell on the illumination sphere, we need to return to the illumination 
subspace £. 

The illumination subspace C not only slices through the non-negative orthant 
of IR”, but other orthants in IR" as well. Let Ci be the intersection of the 
illumination subspace £ with an orthant i in IR” through which £ passes. Certain 
components of x e £^ are always negative and others always greater than or 
equal to zero. Each Ci has a corresponding cell of light source directions Si on 
the illumination sphere. Note that £ does not slice through all of the 2” orthants 
in IR", but at most n{n — 1) + 2 orthants (see the proof of Proposition 1). Thus, 
there are at most n{n — 1) + 2 sets Ci, each with a corresponding cell Si on the 
illumination sphere. 

The set of images produced by the collection of light source directions from a 
cell Si other than Sq can be found as a projection Pi of all points in a particular 
set Ci. The projection Pi is such that it leaves the non- negative components of 
X e £i untouched, while the negative components of x become zero. We denote 
the projected set by Pi{Ci). 

Lemma 2. The set of images Pi{Ci) is a convex cone in IR”. 

Proof. By the same argument used in the proof of Lemma 1, £^ is a convex cone. 
Since the linear projection of a convex cone is itself a convex cone, Pi{Ci) is a 
convex cone. 

Since Pi{Ci) is the projection of Ci, it is at most three dimensional. Each 
Pi{Ci) is the set of all images such that certain facets are illuminated, and the 
remaining facets are shadowed. The dual relation between Pi{Ci) and Si can be 
concisely written as Pi{Ci) = {a max(i?s, 0) : a > 0,s G Si} and = {s : |s| = 

1, max(Bs, 0) G Pi{Ci)}. Let Pq be the identity, so that £b(£o) = £o is the set of 
all images such that all facets are illuminated. The number of possible shadowing 
configurations is the number of orthants in IR" through which the illumination 
subspace £ passes, which in turn is the same as the number of sets Pi{Ci). 

Proposition 1. The number of shadowing configurations is at most m(m — 1) + 

2, where m < n is the number of distinct surface normals. 

Proof. Each of the n pixels in the image has a corresponding great circle on 
the illumination sphere, but only m < n of the great circles are distinct. The 
collection of m distinct great circles carves the surface of the illumination sphere 
into cells. Each cell on the illumination sphere corresponds to a particular set 
of images PfCi). Thus, the problem of determining the number of shadowing 
configurations is the same as the problem of determining the number of cells. 
If every vertex on the illumination sphere is formed by the intersection of only 
two of the m distinct great circles (i.e., if no more than two surface normals 
are coplanar), then it can be shown by induction that the illumination sphere is 
divided into m(m — 1) + 2 cells. If a vertex is formed by the intersection of three 
or more great circles, there are fewer cells. 
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Thus, the set lA of images of a convex Lambertian surface created by varying 
the direction and strength of a single point light source at infinity is given by 
the union of at most n{n — 1) + 2 convex cones, i.e., 

= {x I X = max(Bs, 0), Vs G IR^} 

n(n — 1) + 1 

= U (2) 

i=0 

From this set, we can construct the set C of all possible images of a convex 
Lambertian surface created by varying the direction and strength of an arbitrary 
number of point light sources at infinity, 

k 

C = {x : X =y~]max(j3si,0),Vsj e lR^,V/c G 
1=1 

where is the set of positive integers. 

Proposition 2. The set of images C is a convex cone in IR”. 

Proof. The proof that C is a cone follows trivially from the definition of C. To 
prove that C is convex, we appeal to a proposition for convex cones which states 
that a cone C is convex iff xi + X2 G C for any two points xi, X2 G C [7]. So the 
proof that C is convex also follows trivially from the above definition of C. 

We call C the illumination cone. Every object has its own illumination cone. 
Note that each point in the cone is an image of the object under a particular 
lighting configuration, and the entire cone is the set of images of the object under 
all possible configurations of point light sources at infinity. 

Proposition 3. The illumination cone C of a convex Lambertian surface can 
be determined from as few as three images, each taken under a different, but 
unknown light source direction. 

Proof. The illumination cone C is completely determined by the illumination 
subspace C. If the matrix of surface normals scaled by albedo B were known, 
then this would determine C uniquely, as = {x | x = Rs,Vs G M^}. Yet, from 
images produced by differing, but unknown light source directions, we can not 
determine B uniquely. To see this, note that for any arbitrary invertible 3x3 
linear transformation A G GL{3), 

Bs = {BA){A-^s) = B*s*. 

In other words, the same image is produced when the albedo and surface normals 
are transformed by A, while the light source is transformed by A^^. Therefore, 
without knowledge of the light source directions, we can only recover B* where 
B* = BA, see [10, 17]. Nonetheless B* is sufficient for determining the subspace 
C: it is easy to show that £ = {x | x = B*s,\/s G IR^} = {x | x = Rs,Vs G IR^}, 
see [33] . 
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Thus, for a convex object with Lambertian reflectance, we can determine 
its appearance under arbitrary illumination from as few as three images of the 
object - knowledge of the light source strength or direction is not needed, see 
also [33]. To determine the illumination cone C, we simply need to determine the 
illumination subspace L. In turn, we can choose any three images from the set 
£o, each taken under a different lighting direction, as its basis vectors. Naturally, 
if more images are available, they can be combined to find the best rank three 
approximation to L using singular value decomposition (SVD). 

We should point out that for many convex surfaces the cone can be con- 
structed from as few as three images; however, this is not always possible. If 
the object has surface normals covering the Gauss sphere, then there is only 
one light source direction - the viewing direction - such that the entire visible 
surface is illuminated. For any other light source direction, some portion of the 
surface is shadowed. To determine C, each point on the surface of the object must 
be illuminated in at least three images; for this to be true over the entire visible 
surface, as many as live images may be required. See [20] and Section 5.1 for 
algorithms for determining L from images with shadowed pixels. 

What may not be immediately obvious is that any point within the cone C 
(including the boundary points) can be found as a convex combination of the rays 
(images) produced by light source directions lying at the m(m — 1) intersections 
of the great circles on the illumination sphere. Each of these m(m — 1) rays 
(images) is an extreme ray of the convex cone, because it cannot be expressed 
as a convex combination of two other images in the cone. Furthermore, because 
the cone is constructed from a finite number of extreme rays (images), the cone 
is polyhedral. 

These propositions and observations suggest the following algorithm for con- 
structing the illumination cone from three or more images: 



Illumination Subspace Method: Gather images of the object under 
varying illumination without shadowing and use these images to esti- 
mate the three-dimensional illumination subspace £. After normalizing 
the images to be of unit length, singular value decomposition (SVD) 
can be used to estimate the best orthogonal basis in a least squares 
sense. From the illumination subspace £, the extreme rays defining the 
illumination cone C are then computed. Recall that an extreme ray is 
an image created by a light source direction lying at the intersection of 
two or more great circles. If there are m independent surface normals, 
there can be as many as m(m — 1) extreme rays (images). Let and 
bj be rows of B with i j, the extreme rays are given by 

Xij = max(Rsij, 0) (3) 

where 

Sij=biXbj. (4) 
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In Section 2.4, we use this method to experiment with images of real objects; 
we use a small number of images to build the illumination subspace £ and 
then produce sample images from the illumination cone C. To reduce storage 
and computational requirements for applications using the cone, the images can 
be projected down to a low dimensional subspace; any image in the projected 
cone can be found as convex combinations of the projected extreme rays. Note 
however, that some of the projected extreme rays are redundant since an extreme 
ray may project to the interior of the projected cone. As will be seen in the 
experiments of Section 3.4, the illumination cones of real objects do lie near 
a low dimensional subspace; thus dimensionality reduction by linear projection 
may be justified. 



A Two-Dimensional Example To illustrate the relationship between an ob- 
ject and its illumination cone, consider the simplified 2-D example in Fig. 3. An 
object composed of three facets is shown in Fig. 3. a. For facet i, the product 
of the albedo and surface normal is given by G In this 2-D world, the 
direction of a light source at infinity can be represented as a point on a circle. 

Let us now consider a camera observing the three facets from above such 
that each facet projects to one pixel yielding an image x = {xi,X 2 ,xsY G IR^. 
C is then a 2-D linear subspace of IR^, and the set of images from a single light 
source such that all pixels are illuminated £q £ is the 2-D convex cone shown 
in Figure 3.b. The left edge of Cq, where xs = 0, corresponds to the light source 
direction where Facet 3 just goes into shadow, and similarly the right edge of 
£o) where xi = 0, corresponds to the light source direction where Facet 1 just 
goes in shadow. Now, for a single light source, the set of images is formed by 
projecting £ onto the positive orthant as shown in Figure 3.c. Note for example, 
that the 2-D cone £’i(£i) corresponds to the set of images in which Facets 1 and 
2 are illuminated while Facet 3 is in shadow, and the 1-D ray P^{C^) corresponds 
to the set of image with Facet 1 illuminated and Facets 2 and 3 shadowed. The 
union If^PYCi) defines the walls of the illumination cone C, and the entire 
cone is formed by taking convex combinations of images on the walls. 

As seen in Figure 3.d, the set of light source directions, represented here by 
a circle, can be partitioned into regions Si such that all images produced by 
light sources within a region have the same shadowing configurations. That is, 

= {s : |s| = l,max(ZIs,0) G Pi{Ci)}. The corresponding partitioning of light 
source directions is shown in Figure 3. a. 



2.2 Illumination Cones for Arbitrary Objects 

In the previous sub-section, we assumed that the objects were convex in shape 
and had Lambertian reflectance functions. The central result was that the set of 
images of the object under all possible illumination conditions formed a convex 
cone in the image space and that this illumination cone can be constructed from 
as few as three images. Yet, most objects are non- convex in shape and have 
reflectance functions which can be better approximated by more sophisticated 
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Fig. 3. A 2-D Example: a. A surface with three facets is observed from above 
and produces an image with pixels x\,X 2 and X 3 . b. The linear subspace C 
and its intersection with the positive quadrant Cq. c. The “walls” of the cone 
Pi{Ci) corresponding to images formed by a single light source. The illumination 
cone C is formed by all convex combinations of images lying on the walls, d. The 
geometry of facets leads to a partitioning of the illumination circle. 



physical [30, 36, 38] and phenomenological [22] models. The question again arises: 
What can we say about the set of images of an object with a non-convex shape 
and a non-Lambertian reflectance function? 

The proof of Proposition 2 required no assumptions about the shape of the 
object, the nature of the light sources, or the reflectance function for the object’s 
surface. Consequently, we can state a more general proposition about the set of 
images of an object under varying illumination: 
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Proposition 4. The set of n-pixel images of any object, seen under all possible 
lighting conditions, is a convex cone in M”. 

Therefore, even for a nonconvex object with a non-Lambertian reflectance func- 
tion, the set of images under all possible lighting conditions still forms convex 
cone in the image space. This result is in some sense trivial, arising from the 
superposition property of illumination: the image of an object produced by two 
light sources is simply the addition of the two images produced by the sources 
individually. 

It is doubtful that the illumination cone for such objects can be constructed 
from as few as three images. This is not due to the non-convexity of objects and 
the shadows they cast. The structure of objects with Lambertian reflectance, but 
non-convex shapes, can be recovered up to a “generalized bas-relief” transfor- 
mation from as few as three images [2]. From this, it is possible to determine the 
cast shadows exactly. Rather, the difficulty is due to the fact that the reflectance 
function is unknown. To determine the reflectance function exactly could take an 
infinite number of images. However, the Illumination Subspace Method devel- 
oped in Section 2.1 can be used to approximate the cone, as will be seen in the 
empirical investigation of Section 2.4; such an approximation for a non-convex, 
non-Lambertian surface is used in the face recognition experiment in Section 5. 
An alternative method for approximating the cone is presented below: 

Sampling Method: Illuminate the object by a series of light source 
directions which evenly sample the illumination sphere. The resulting 
set of images is then used as the set of extreme rays of the approximate 
cone. 

Note that this approximate cone is a subset of the true cone and so any image 
contained within the approximate cone is a valid image. The Sampling Method 
has its origins in the linear subspace method proposed by Hallinan [15]; yet, it 
differs in that the illumination cone restricts the images to be convex - not linear 
- combinations of the extreme rays. This method is a natural way of extending 
the appearance manifold method of Murase and Nayar to account for multiple 
light sources and shadowing [27]. 



2.3 Illumination Cones for Non-convex Lambertian Surfaces 

While the sampling method provides the means to approximate the cone for 
objects with aribtrary geometry and reflectance functions and illuminated by 
multiple light sources, it is necessary to have observed the object under many 
lighting conditions to obtain a good approximation. On the other hand, if the ob- 
ject is convex and Lambertian, the illumination subspace method can be used to 
construct the entire cone from only three images. Here we consider an intermedi- 
ate situation where the surface is Lambertian but non-convex. Most significantly. 
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non-convex objects can cast shadows upon themselves. Whereas attached shad- 
ows are defined by a local condition (See Equation 1), cast shadows are global in 
nature. Nonetheless, from Section 2.2 we know that the set of images must still 
be a cone; here we show how an approximation of this cone can be constructed 
from as few as three images. 

The illumination subspace method suggests a starting point for constructing 
the illumination cone: gather three or more images of an object under varying 
illumination without shadowing and use these images to estimate the three- 
dimensional illumination subspace £. Note that the estimated basis B* differs 
from the true B (rows which are the surface normal scaled by the albedo) by an 
unknown linear transformation, i.e., B = B*A where A G GL{3); for any light 
source, Bs = (Z?A)(A^^s). Nonetheless, for a convex object, the extreme rays 
defining the illumination cone C can be computed using Equations 3 and 4 using 
B* . For a non-convex object, cast shadows can cover significant portions of the 
visible surface when the angle of the light source with respect to the viewing 
direction is large (extreme illumination) ; see the images from Subsets 4 and 5 in 
Fig. 9. Yet the image formation model (Eq. 1) used to develop the illumination 
cone in Section 2.1 does not account for cast shadows. 

It has been shown in [2, 42] and in this book that from multiple images 
where the light source directions are unknown, one can only recover a Lambertian 
surface up to a three-parameter family given by the generalized bas-relief (GBR) 
transformation. This family is a restriction on A, and it has the effect of scaling 
the relief (flattening or extruding) and introducing an additive plane. Since both 
shadows and shading are preserved under these transformation [2, 23], images 
synthesized from a surface whose normal field is given by B* under light source 
s*j will have correct shadowing. Thus, to construct the extreme rays of the cone, 
we first reconstruct a Lambertian surface (a height function plus albedo) from 
B*. This surface is not an approximation of the original surface, but rather a 
representative element of the orbit of the original surface under GBR. For a given 
light source direction s*, ray-tracing techniques can be used to determine which 
surface points lie in a cast shadow. Whereas for convex Lambertian objects, the 
illumination sphere is partitioned into m(m — 1) + 2 regions by m great circles, 
the illumination sphere will be partitioned by more complex curves for non- 
convex Lambertian objects, and so it is expected that there will be many more 
shadowing configurations. As such, it is unlikely that an exact representation of 
the cone could be used in practice. This approximate cone is a subset of the true 
cone when there is no imaging noise. 

These observations lead to the following steps for constructing an approxi- 
mation to the illumination cone of a non-convex Lambertian surface from a set 
of images taken under unknown lighting. 
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Cast Shadow Method: 

1. Gather images of the object under varying illumination without 
shadowing. 

2. Estimate B* from training images. 

3. Reconstruct a surface up to GBR. 

4. For a set of light source directions that uniformly sample the sphere, 
use ray-tracing to synthesize images from the reconstructed surface 
that account for both cast and attached shadows. 

5. Use synthetic images as extreme rays of cone. 

More details of these steps as applied to face recognition will be provided in 
Section 5.1. 

2.4 An Empirical Investigation: Building Illumination Cones 

To demonstrate the power of these concepts, we have used the Illumination 
Subspace Method to construct the illumination cone for two different scenes: a 
human face and a desktop still life. To construct the cone for the human face, we 
used images from the Harvard Face Database [15], a collection of images of faces 
seen under a range of lighting directions. For the purpose of this demonstra- 
tion, we used the images of one person, taking six images with little shadowing 
and using singular value decomposition (SVD) to construct a 3-D basis for the 
illumination subspace C. Xotc that this 3-D linear subspacc differs from the 
affine subspace constructed using the Karhunen-Loeve transform: the mean im- 
age is not subtracted before determining the basis vectors as in the Eigenpicture 
methods [24, 39]. 

The illumination subspace was then used to construct the illumination cone 
C. We generated novel images of the face as if illuminated by one, two, or three 
point light sources by randomly sampling the illumination cone. Rather than con- 
structing an explicit representation of the half-spaces bounding the illumination 
cone, we sampled £, determined the corresponding orthant, and appropriately 
projected the image onto the illumination cone. Images constructed under multi- 
ple light sources simply correspond to the superposition of the images generated 
by each of the light sources. 

The top two rows of Fig. 4 show all six low resolution images of a person’s 
face that were used to construct the basis of the linear subspace £. The bottom 
row of Fig. 4 shows three basis images that span £. Each of the three columns 
of Fig. 5 respectively comprises of sample images from the illumination cone for 
the face with one, two, or three light sources. 

There is a number of points to note about this experiment. There was almost 
no shadowing in the training images yet there are strong attached shadows in 
many of the sample images. These are particularly distinct in the images gen- 
erated with a single light source. Notice for example the sharp shadow across 
the ridge of the nose in Column 1, Row 2 or the shadowing in Column 1, Row 4 
where the light source is coming from behind the head. Notice also the depres- 
sion under the cheekbones in Column 2, Row 5, and the cleft in the chin revealed 
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Original Images 




Basis Images 



Fig. 4. Illumination Subspace Method: The top two rows of the figure show 
all six of the original images used to construct the illumination subspace £ for 
the face. The bottom row of the figure shows three basis images, that span the 
illumination subspace £ for the face. 



in Column 1, Row 3. For the image in Column 3, Row 2, two of the light sources 
are on opposite sides while the third one is coming from below; notice that both 
ears and the bottom of the chin and nose are brightly illuminated while the rest 
of the face is darker. 

To construct the cone for the desktop still life, we used our own collection 
of nine images with little shadowing. The top row of Fig. 6 shows three of these 
images. The second row of Fig. 6 shows the three basis images that span £. Each 
of the lower three columns of Fig. 6 respectively comprises of sample images from 
the illumination cone for the desktop still life with one, two, or three light sources. 

The variability in illumination in these images is so extreme that the edge 
maps for these images would differ drastically. Notice in the image in Column 
1, Row 4 that the shadow line on the bottle is distinct and that the left sides 
of the phone, duck, and bottle are brightly illuminated. Throughout the scene, 
notice that those points having comparable surface normals seem to be similarly 
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1 Light 



2 Lights 



3 Lights 



Fig. 5. Random Samples from the Illumination Cone of a Face: Each of 
the three columns respectively comprises of sample images from the illumination 
cone with one, two, or three light sources. 



illuminated. Furthermore, notice that all of the nearly horizontal surfaces in the 
bottom two images of the first column are in shadow since the light is coming 
from below. In the image with two light sources shown at the bottom of Column 
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1 Light 2 Lights 



3 Lights 



Fig. 6. Illumination Subspace Method: The top row of the figure shows 
three of the original nine images used to construct the illumination subspace 
£ for the still life. The second row shows the three basis images that span the 
illumination subspace £. Each of the lower three columns respectively comprises 
of sample images from the illumination cone with one, two, or three light sources. 
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2, the sources are located on opposite sides and behind the objects. This leads 
to a shadow line in the center of the bottle. The head of the wooden duck shows 
a similar shadowing where its front and back are illuminated, but not the side. 

3 Dimension and Shape of the Illumination Cone 

In this section, we investigate the dimension of the illumination cone, and show 
that it is equal to the number of distinct surface normals. However, we conjecture 
that the shape of the cone is flat, with much of its volume concentrated near a 
low-dimensional subspace, and present empirical evidence to support this con- 
jecture. Finally, we show that the cones of two objects with the same geometry, 
but with separate albedo patterns, differ by a diagonal linear transformation. 



3.1 The Dimension of the Illumination Cone 

Given that the set of images of an object under variation in illumination is a 
convex cone, it is natural to ask: What is the dimension of the cone in IR"? 
By this we mean, what is the span of the vectors in the illumination cone C? 
Why do we want to know the answer to this question? Because the complexity 
of the cone, may dictate the nature of the recognition algorithm. For example, 
if the illumination cones are 1-D, i.e., rays in the positive orthant of IR", then a 
recognition scheme based on normalized correlation could handle all of the vari- 
ation due to illumination. However, in general the cones are not one dimensional 
unless the object is planar. To this end, we offer the following proposition. 

Proposition 5. The dimension of the illumination cone C is equal to the num- 
ber of distinct surface normals. 

Proof. As with the proof of Proposition 1, we again represent each light source 
direction by a point on the surface of the illumination sphere. Each cell on the 
illumination sphere corresponds to the light source directions which produce a 
particular set of images Pi{Ci). For every image in a set Pi{£i), certain pixels 
are always equal to zero, i.e., always in shadow. There exists a cell Sq on the 
illumination sphere corresponding to the light source directions which produce 
£o) the set of images in which all pixels are always illuminated. There exists a 
cell Sd corresponding to the light source directions which produce a set of images 
in which all pixels are always in shadow. Choose any point si, E Sq. The point 
Sd = — Sft is antipodal to S{, and lies within Sd. Draw any half- meridian connecting 
S{, and Sd. Starting at s^, follow the path of the half-meridian; it crosses m distinct 
great circles, and passes through m different cells before entering Sd. Note that 
the path of the half-meridian corresponds to a particular path of light source 
directions, starting from a light source direction producing an image in which all 
pixels are illuminated and ending at a light source direction producing an image 
in which all pixels are in shadow. Each time the half-meridian crosses a great 
circle, the pixel corresponding to the great circle becomes shadowed. 
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Take an image produced from any light source direction within the interior 
of each cell through which the meridian passes, including Sq, but excluding Sd- 
Arrange each of these m images as column vectors in an n x m matrix M. By 
elementary row operations, the matrix M can be converted to its echelon form 
M*, and it is trivial to show that M* has exactly m non-zero rows. Thus, the 
rank of M is m, and the dimension of C is at least m. Since there are only 
m distinct surface normals, the dimension of C cannot exceed m. Thus, the 
dimension of C equals m. 

Note that for images with n pixels, this proposition indicates that the di- 
mension of the illumination cone is one for a planar object, is roughly -^Jn for a 
cylindrical object, and is n for a spherical object. But if the cone spans IR", what 
fraction of the positive orthant does it occupy? In Section 3.3, we investigate this 
question, conjecturing that the illumination cones for most objects occupy little 
volume in the image space. 

3.2 The Connection between Albedo and Cone Shape 

If two objects are similar in geometry, but differ in their respective albedo pat- 
terns, then there is a simple linear relationship between their corresponding il- 
lumination cones. Here, we consider two Lambertian objects that have the same 
underlying geometry, but have differing albedo patterns (e.g., a Coke can and a 
Pepsi can). In this case, the product of albedo and surface normals for the two 
objects can be expressed as Bi = RiN and Ba = R2N where A is an n x 3 
matrix of surface normals and Ri is a,n n x n diagonal matrix whose diagonal 
elements are positive and represent the albedo. The following proposition relates 
the illumination cones of the two objects. 

Proposition 6. If C\ is the illumination cone for an object defined by B\ = 
R.\N and C2 is the illumination cone for an object defined by B2 = R2N , then 

Cl = {RiRf^x : X G C2] and 

C2 = {R2Rf^x : X e Cl}. 

Proof. For every light source direction s, the corresponding images are given by 
xi = max(i?is, 0) = i?imax(As,0) and X2 = max(i?2S, 0) = i?2 niax(As, 0). 
Since Ri and R2 are diagonal with positive diagonal elements, they are invertible. 
Therefore, xi = RiRf^X2 and X2 = R2Rf^xi. 

Thus, the cones for two objects with identical geometry but differing albedo 
patterns differ by a diagonal linear transformation. This fact can be applied 
when computing cones for objects observed by color (multi-band) cameras as 
noted in Section 4. Note that this proposition also holds when the objects are 
non-convex; since the partitioning of the illumination sphere is determined by 
the objects’ surface geometry, the set of shadowing configurations is identical for 
two objects with the same shape. The intensities of the illuminated pixels are 
related by the transformations given in the proposition. 
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3.3 Shape of the Illumination Cone 

While we have shown that an illumination cone is a convex, polyhedral cone 
that can span n dimensions if there are n distinct surface normals, we have 
not said how big it is in practice. Note that having a cone span n dimensions 
does not mean that it covers IR", since a convex cone is defined only by convex 
combinations of its extreme rays. It is conceivable that an illumination cone 
could completely cover the positive orthant of IR". However, the existence of an 
object geometry that would produce this is unlikely. For such an object, it must 
be possible to choose n light source direction such that each of the n facets are 
illuminated independently. 

On the other hand, if the illumination cones for objects are small and well 
separated, then recognition should be possible, even under extreme lighting con- 
ditions. We believe that the latter is true - that the cone has almost no volume 
in the image space. We offer the following conjecture: 

Conjecture 1. The shape of the cone is “fiat,” i.e., most of its volume is concen- 
trated near a low-dimensional subspace. 

While we have yet to prove this conjecture, the empirical investigations of [9, 15] 
and the one in the following section seem to support it. 

3.4 Empirical Investigation of the Shape of the Illumination Cones 

To investigate Proposition 5 and Conjecture 1, we have gathered several images, 
taken under varying lighting conditions, of two objects: the corner of a cardboard 
box and a Wilson tennis ball. For both objects, we eomputed the illumination 
subspace using SVD their corresponding sets of images. Using the estimated 
illumination subspaces, we performed two experiments. 

In the first experiment, we tried to confirm that the illumination spheres for 
both objects would appear as we would expect. For both the box and the tennis 
ball, we drew the great circles associated with each pixel on the illumination 
sphere, see Fig. 7. From Proposition 5, we would expect the illumination cone 
produced by the corner of the box to be three dimensional since the corner 
has only three faces. The illumination sphere should be partitioned into eight 
regions by three great circles, each meeting the other two orthogonally. This 
structure is evident in the figure. Yet, due to both image noise and the fact that 
the surface is not truly Lambertian, there is some small deviation of the great 
circles. Furthermore, the few pixels from the edge and corner of the box produce 
a few stray great circles. In contrast, the visible surface normals of the tennis 
ball should nearly cover half of the Gauss sphere and, therefore, the great circles 
should nearly cover the illumination sphere. Again, this structure is evident in 
the figure. 

In the seeond experiment, we plotted the eigenvalues of the matrix of extreme 
rays for both the box and the tennis ball. The point of this experiment was to 
compare the size and “flatness” of both cones. As discussed in Section 2.1, an 
extreme ray is an image created by a light source direction s^- lying at the 
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Fig. 7. Examples of Illumination Spheres: On the left, the figure shows an 
image of the corner of a cardboard box and its corresponding illumination sphere. 
Note that the illumination sphere is, for the most part, partitioned into eight 
regions by three great circles, each meeting the other two orthogonally. On the 
right, the figure shows an image of a Wilson tennis ball and its corresponding 
illumination sphere. Note that the great circles nearly cover the illumination 
sphere. 



intersection of two or more great circles on the illumination sphere. The matrix 
of extreme rays is simply the matrix whose columns are the vectorized images 
Xy/|xij|. We then performed SVD on the matrix of extreme rays for the box 
corner and the matrix of extreme rays for the tennis ball. The corresponding 
eigenvalues are plotted in decreasing order in Fig. 8. 

From this figure we make the following observations. First, in the plot of the 
box corner there is a sharp drop-off after the third eigenvalue, indicating that 
most of the illumination cone is concentrated near a 3-D subspace of the image 
space. Second, the eigenvalues for the tennis ball do not drop-off as quickly 
as those for the box, indicating that the illumination cone for the tennis ball 
is larger than that for the box. And, third, the eigenvalues for both the box 
corner and the tennis ball diminish by at least two orders of magnitude within 
the first fifteen eigenvalues. Thus, in agreement with the above conjecture, the 
illumination cones appear to be concentrated near a low dimensional subspace. 
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Plot of Box Eigenvalues for Matrix of Extreme Rays Plot of Ball Eigenvalues for Matrix of Extreme Rays 




Eigenvalue Eigenvalue 



Fig. 8. Eigenvalues for the Matrix of Extreme Rays: The figure shows a 
plot in decreasing order of the eigenvalues of the matrix of extreme rays for the 
illumination cone of the corner of a box and for the illumination cone of a tennis 
ball. 



We should point out that Epstein et al. [9] and Hallinan [15] performed a 
related experiment on images created by physically moving the light source to 
evenly sample the illumination sphere. They, too, found that the set of images 
of an object under variable illumination lies near a low dimensional subspace. 
Our results using synthesized images from the cone seem to complement their 
findings. 

4 Color 

Until now, we have neglected the spectral distribution of the light sources, the 
color of the surface, and the spectral response of the camera; here we extend the 
results of Section 2 to multi-spectral images. 

Let A denote the wavelength of light. Let pi{X) denote the response for all 
elements of the ith color channel. Let R{X) be a diagonal matrix whose elements 
are the spectral reflectance functions of the facets, and let the rows of G 
IR”^^ be the surface normals of the facets. Finally, let s(A) and s be the power 
spectrum and direction of the light source, respectively. Then, ignoring attached 
shadows and the associated max operation, the n-pixel image x^ produced by 
color channel i of a convex Lambertian surface from a single colored light is [18, 
21 ] 

Xi = y pi{X){R{X)N){s{X)s)dX. (5) 

It is difficult to make limiting statements about the set of possible images 
of a colored object when p(A), R{X) and 5(A) are arbitrary. For example, if 
we consider a particular object with a spectral reflectance function R{X) and 
surface normals N, then without constraining assumptions on pi{X) and 5(A), 
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any image Xi is obtainable. Consequently, we will consider two specific cases: 
cameras with narrow-band spectral response and light sources with identical 
spectral distributions. 



4.1 Narrow-Band Cameras 

Following [29], if the sensing elements in each color channel have narrow-band 
spectral response or can be made to appear narrow band [11], then pi{\) can be 
approximated by a Dirac delta function about some wavelength A^, and Eq. 5 
can be rewritten as 

Xi = p{\i){R{\i)N){~s{\i)s) 

= Pi(i?,iV)(sis). 

Note that pi , Ri and N are constants for a given surface and camera whereas 
Si and s depend on properties of the light source. Eq. 6 can be expressed using 
the notation of Eq. 1 where B = piRiN and s = sS. The diagonal elements of 
PiRi are the effective albedo of the facets for color channel i. For c narrow-band 
color channels, the color image x = [x| | X 2 | • ■ • ] x*]* formed by stacking up the c 
images for each channel can be considered as a point in IR'^". Under a single light 
source, x is a function of s and si ■ • • Sc- Taken over all light source directions 
and spectral distributions, the set of images from a single light source without 
shadowing is a c -|- 2 dimensional manifold in It is easy to show that this 
manifold is embedded in a 3c-dimensional linear subspace of IR'^", and that any 
point (image) in the intersection of this linear subspace with the positive orthant 
of ]R‘^" can be achieved by three colored light sources. 

A basis for this 3c-dimensional subspace can be constructed from three color 
images without shadowing. This is equivalent to independently constructing c 
three-dimensional linear subspaces in M", one for each color channel. Note that 
PiRiN spans subspace i. When attached shadows are considered, an illumination 
cone can be constructed in IR" for each color channel independently. The cones 
for each color channel are closely related since they arise from the same surface; 
effectively the albedo matrix Ri may be different for each color channel, but 
the surface normals N are the same. As demonstrated in Section 3.2, the cones 
for two surfaces with the same geometry, but different albedo patterns differ by 
a diagonal linear transformation. Now, the set of all multi-spectral images of 
a convex Lambertian surface is a convex polyhedral cone in IR'^"' given by the 
Cartesian product of the c individual cones. Following Proposition 5, this color 
cone spans at most cm dimensions where m is the number of distinct surface 
normals. 



4.2 Light Sources with Identical Spectra 

Consider another imaging situation in which a color camera (c channels, not 
necessarily narrow-band) observes a scene where the number and location of 
the light sources are unknown, but the power spectral distributions of all light 
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sources are identical (e.g. , incandescent bulbs) . Equation 5 can then be rewritten 
as 

X* = (^y p^{X)5{X)R{X)dX^ m. (7) 

In this case, the integral is independent of the light source direction and scales 
with its intensity. If we define the intensity of the light source to be s = / s{X)dX, 
then s = Ss and = I f pi(X)S(X)R(X)dX. Equation 7 can then be expressed 
as 

Xi = RiNs. 

For c color channels, the color image x e IR'^” is given by 

X = [Ri I i ?2 I ■ ■ • I Ref Ns. 

Consequently, the set of images of the surface without shadowing is a three- 
dimensional linear subspace of IR'^” since Ri and N are constants. Following 
Section 2, the set of all images with shadowing is a convex polyhedral cone 
that spans m dimensions of IR'^". Thus, when the light sources have identical 
power spectra (even if the camera is not narrow-band), the set of all images 
is significantly smaller than considered above since the color measured at each 
pixel is independent of the light source direction. 

5 Face Recognition Using the Illumination Cone 

Until this point, we have focused on properties of the set of images of an object 
under varying illumination. Here, we utilize these properties to develop repre- 
sentations and algorithms for recognizing objects, namely faces, under differ- 
ent lighting conditions. Face recognition is a challenging yet well-studied prob- 
lem [8, 32]; the difficulty in face recognition arises from the fact that many faces 
are geometrically and photometrically very similar, yet there is a great deal of 
variability in the images of an individual due to changes of pose, lighting, facial 
expression, facial hair, hair style, makeup, age, etc. Here we focus solely on il- 
lumination, and in this section, we empirically compare these new methods to 
a number of popular techniques such as correlation [6] and Eigenfaces [24, 39] 
as well as more recently developed techniques such as distance to linear sub- 
space [3, 15, 29, 34]. 

5.1 Constructing the Illumination Cone Representation of Faces 

In the experiments reported below, illumination cones are constructed using vari- 
ations of the illumination subspace method and the cast shadow method. When 
implementing these methods, there are two problems which must be addressed. 

The first problem that arises with these two methods is with the estimation 
of B* . For even a convex object whose Gaussian image covers the Gauss sphere, 
there is only one light source direction - the viewing direction - for which no 
point on the surface is in shadow. For any other light source direction, shadows 
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Subset 1 Subset 2 Subset 3 Subset 4 Subset 5 




Fig. 9. Example images from each subset of the Harvard Database used to test 
the algorithms. 



will be present. For faces, which are not convex, shadowing in the modeling im- 
ages is likely to be more pronounced. When SVD is used to estimate B* from 
images with shadows, these systematic errors can bias the estimation signifi- 
cantly. Therefore, alternative ways are needed to estimate B* that take into 
account the fact that some data values should not be used in the estimation. 

The next problem is that usually m, the number of independent normals in 
B, can be large (more than a thousand) hence the number of extreme rays needed 
to completely define the illumination cone can run in the millions. Therefore, we 
must approximate the cone in some fashion; in this work, we choose to use a 
small number of extreme rays (images). In Section 3.4 it was shown empirically 
that the cone is flat (i.e., elements lie near a low dimensional linear subspace), 
and so the hope is that a sub-sampled cone will provide an approximation that 
leads to good recognition performance. In our experience, around 60-80 images 
are sufficient, provided that the corresponding light source directions s^- more or 
less uniformly sample the illumination sphere. The resulting cone C* is a subset 
of the object’s true cone C. In the Sampling Method described in Section 2.2, 
an alternative approximation to C is obtained by directly sampling the space of 
light source directions rather than generating the extreme rays through Eq. 4. 
While the resulting images form the extreme rays of the representation C* and 
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lie on the boundary of the true cone C, they are not necessarily extreme rays of 

C. 



Estimating B* Using singular value decomposition directly on the images 
leads to a biased estimate of B* due to shadows. In addition, portions of some of 
the images from the Harvard database used in our experiments were saturated. 
Both shadows formed under a single light source and saturations can be detected 
by thresholding and labeled as “missing” - these pixels do not satisfy the linear 
equation x = Bs. Thus, we need to estimate the 3-D linear subspace B* from 
images with missing values. 

Define the data matrix for c images of an individual to be = [xi . . . Xc] . 
If there were no shadowing, X would be rank 3, and we could use SVD to 
decompose X into X = B*S* where S* is the 3 x c matrix of the light source 
direction for all c images. To estimate a basis B* for the 3-D linear subspace 
C from image data with missing elements, we have implemented a variation of 
[35]; see also [37, 20]. 

The overview of this method is as follows: without doing any row or column 
permutations sift out all the full rows (with no invalid data) of matrix X to form 
a full sub-matrix X. Perform SVD on X and get an initial estimate of S*. Fix 
S* and estimate each of the rows of B* independently using least squares. Then, 
fix B* and estimate each of the light source direction s* independently. Repeat 
last two steps until estimates converge. The inner workings of the algorithm are 
given as follows: Let be the ith row of B*, let x^ be the zth row of X. Let p 
be the indices of non- missing elements in x^, and let x^ be the row obtained by 
taking only the non- missing elements of x^, and let similarly be the submatrix 
of S* consisting of rows with indices in p. Then, the ith row of B* is given by, 

b, = (xf)(5^)t 

where (5^)1^ is the pseudo-inverse of S^. With the new estimate of B* at hand, 
let Xj be the jth column of X, let p be the indices of non-missing elements in 
Xj , and let Xj be the column obtained by taking only the non-missing elements 
of Xj. Let BP similarly be the submatrix of B* consisting of rows with indices 
in p. Then, the jth light source direction is given by, 

s, = (R^)t(x^^) 

After the new set of light sources S* has been calculated, the last two steps 
can be repeated until the estimate of B* converges. The algorithm is very well 
behaved, converging to the global minimum within 10-15 iterations. Though it 
is possible to converge to a local minimum, we never observed this in simulation 
or in practice. 



Enforcing Integrability To predict cast shadows, we must reconstruct a sur- 
face and to do this, the vector field B* must correspond to an integrable normal 
field. Since no method has been developed to enforce integrability during the 
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b. 



Fig. 10. Figure 4 showed six original images of a face and three images spanning 
the linear subspace B*. a) From B* , the surface is reconstructed up to a GBR 
transformation, b) Sample images from database (left column); closest image in 
illumination cone without cast shadows (middle column); and closest image in 
illumination cone with cast shadows (right column) 



estimation of B*, we enforce it afterwards. That is, given B* computed as de- 
scribed above, we estimate a matrix A e GL{3) such that B*A corresponds to 
an integrable normal field; the development follows [42]. 

Consider a continuous surface defined as the graph of z{x, y), and let b be the 
corresponding normal field scaled by an albedo (scalar) field. The integrability 
constraint for a surface is Zxy = Zyx where subscripts denote partial derivatives. 
In turn, b must satisfy: 




To estimate A such that b^(x, y) = (x,y)A, we expand this out. Letting 

the columns of A be denoted by Ai,A 2 ,A^ yields 
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ih*" As)ihf A 2 ) - (b*"A2)(bf ylg) = 

(b*^A3)(bf Ai) - (b*^Ai)(bf A3) 
which can be expressed as 

b*^5\b* = b*^52b; (8) 

where S*! = AsAf^ — A2A3" and S 2 = A3A^ — AiAg". 

Si and S 2 are skew-symmetric matrices and have three degrees of freedom. 
Equation 8 is linear in the six elements of Si and S 2 - From the estimate of B* 
obtained using the method in Section 5.1, discrete approximations of the partial 
derivatives (b* and b*) are computed, and then SVD is used to solve for the 
six elements of and S'2. In [42], it was shown that the elements of Si and 
S 2 are cofactors of A, and a simple method for computing A from the cofactors 
was presented. This procedure only determines six degrees of freedom of A. The 
other three correspond to the generalized bas relief (GBR) transformation [2] 
and can be chosen arbitrarily since GBR preserves integrability. The surface 
corresponding to B*A differs from the true surface by GBR, i.e., z*{x,y) = 
\z{x, y) + i^ix + vy for arbitrary A, iz with A ^ 0. 



Generating a GBR Snrface The preceding sections give a method for es- 
timating the matrix B* and then enforcing integrability; we now reconstruct 
the corresponding surface z{x,y). Note that z{x,y) is not a Euclidean recon- 
struction of the face, but a representative element of the orbit under a GBR 
transformation. Recall that both shading and shadowing will be correct for im- 
ages synthesized from a transformed surface. 

To find z{x,y), we use the variational approach presented in [19]. A surface 
z{x,y) is fit to the given components of the gradient P = and q = 

^ by minimizing the functional 






p)^ + {Zy - q)^ dxdy. 



whose Euler equation reduces to z = px + qy By enforcing the right natu- 
ral boundary conditions and employing an iterative scheme that uses a discrete 
approximation of the Laplacian, we can generate the surface z{x,y) [19]. Then, 
it is a simple matter to construct an illumination cone representation that in- 
corporates cast shadows. Using ray-tracing techniques for a given light source 
direction, we can determine the cast shadow regions and correct the extreme 
rays of C*. 

Figures 4 and 10 demonstrate the process of constructing the cone C* . Figure 
4 shows the training images for one individual in the databaseas as well as the 
columns of the matrix B*. Figure 10. a shows the reconstruction of the surface 
up to a GBR transformation. The left column of Fig. 10. b shows sample images 
in the database; the middle column shows the closest image in the illumination 
cone without cast shadows; and the right column shows the closest image in the 
illumination cone with cast shadows. Note that the background and hair have 
been masked. 
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5.2 Recognition 

The cone C* can be used in a natural way for face recognition, and in experi- 
ments described below, we compare three recognition algorithms to the proposed 
method. From a set of face images labeled with the person’s identity ( the learn- 
ing set) and an unlabeled set of face images from the same group of people ( the 
test set), each algorithm is used to identify the person in the test images. For 
more details of the comparison algorithms, see [3]. We assume that the face has 
been located and aligned within the image. 

The simplest recognition scheme is a nearest neighbor classifier in the image 
space [6] . An image in the test set is recognized (classified) by assigning to it the 
label of the closest point in the learning set, where distances are measured in 
the image space. If all of the images are normalized to have zero mean and unit 
variance, this procedure is equivalent to choosing the image in the learning set 
that best eorrelates with the test image. Because of the normalization process, 
the result is independent of light source intensity. 

As correlation methods are computationally expensive and require great 
amounts of storage, it is natural to pursue dimensionality reduction schemes. 
A technique now commonly used in computer vision - particularly in face recog- 
nition - is principal components analysis (PCA) which is popularly known as 
Eigenfaees [15, 27, 24, 39]. Given a collection of training images e IR", a 
linear projection of each image = Wxi to an /-dimensional feature space is 
performed. A face in a test image x is recognized by projecting x into the feature 
space, and nearest neighbor classification is performed in IR-^. The projection ma- 
trix W is chosen to maximize the scatter of all projected samples. It has been 
shown that when / equals the number of training images, the Eigenface and Cor- 
relation methods are equivalent (See [3, 27]). One proposed method for handling 
illumination variation in PCA is to discard from W the three most significant 
principal components; in practice, this yields better recognition performance [3]. 

A third approach is to model the illumination variation of each face as a 
three-dimensional linear subspace £ as described in Section 2.1. To perform 
recognition, we simply compute the distance of the test image to each linear 
subspace and choose the face corresponding to the shortest distance. We call 
this recognition scheme the Linear Subspace method [2]; it is a variant of the 
photometric alignment method proposed in [34] and is related to [16, 29]. While 
this models the variation in intensity when the surface is completely illuminated, 
it does not model shadowing. 

Finally, given a test image x, recognition using illumination cones is per- 
formed by first computing the distance of the test image to each cone, and then 
choosing the face that corresponds to the shortest distance. Since each cone is 
convex, the distance can be found by solving a convex optimization problem. In 
particular, the non-negative linear least squares technique contained in Matlab 
was used in our implementation, and this algorithm has computational complex- 
ity 0{n e^) where n is the number of pixels and e is the number of extreme rays. 
Two different vatiations for constructing the cone and a method for increasing 
the speed are considered. 




Representations for Recognition Under Variable Illumination 



125 




Subset 1 
Subset 2 
Subset 3 
Subset 4 
Subset 5 



Fig. 11. The highlighted lines of longitude and latitude indicate the light source 
directions for Subsets 1 through 5. Each intersection of a longitudinal and latitu- 
dinal line on the right side of the illustration sphere has a corresponding image 
in the database. 



5.3 Experiments and Results 

To test the effectiveness of these recognition algorithms, we performed a series 
of experiments on a database from the Harvard Robotics Laboratory in which 
lighting had been systematically varied [15, 16]. In each image in this database, 
a subject held his/her head steady while being illuminated by a dominant light 
source. The space of light source directions, which can be parameterized by 
spherical angles, was then sampled in 15° increments. See Figure 11. From this 
database, we used 660 images of 10 people (66 of each). We extracted five subsets 
to quantify the effects of varying lighting. Sample images from each subset are 
shown in Fig. 9. Subset 1 (respectively 2, 3, 4, 5) contains 60 (respectively 90, 
130, 170, 210) images for which both the longitudinal and latitudinal angles 
of light source direction are within 15° (respectively 30°, 45°, 60°, 75°) of the 
camera axis. 

All of the images were cropped (96 by 84 pixels) within the face so that 
the contour of the head was excluded. For the Eigenface and correlation tests, 
the images were normalized to have zero mean and unit variance, as this im- 
proved the performance of these methods. For the Eigenface method, we used 
twenty principal components - recall that performance approaches correlation 
as the dimension of the feature space is increased [3, 27]. Since the first three 
principal components are primarily due to lighting variation and since recogni- 
tion rates can be improved by eliminating them, error rates are also presented 
when principal components four through twenty-three are used. For the cone ex- 
periments, we tested two variations: in the first variation (Cones-attached), the 
representation was constructed ignoring cast shadows by essentially using the 
illumination subspace method except that B* is estimated using the technique 
described in Section 5.1. In the second variation (Cones-cast), the representation 
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was constructed using the cast shadow method as described in Section 5.1. In 
both variations, recognition was performed by choosing the face corresponding 
to the smallest computed distance to cone. 

In our quest to speed up the recognition process using cones, we also employed 
principal components analysis (PC A). The collection of all images in the cones 
(with cast shadows) is projected down to a 100-dimensional feature space. This 
is achieved by performing a linear projection of the form = ITxi, where the 
projection matrix W is chosen to maximize the scatter of all projected samples. A 
face in an image, normalized to have zero mean and unit variance, is recognized 
by first projecting the image down to this 100-dimensional feature space and 
then performing nearest neighbor classification. 

Mirroring the extrapolation experiment described in [3], each method was 
trained on samples from Subset 1 (near frontal illumination) and then tested 
using samples from Subsets 2, 3, 4 and 5. (Note that when tested on Subset 
1, all methods performed without error). Figure 12 shows the result from this 
experiment. 



5.4 Discussion of Face Recognition Results 

From the results of this experiment, we draw the following conclusions: 

— The illumination cone representation outperforms all of the other techniques. 

— When cast shadows are included in the illumination cone, error rates are 
improved. 

— PGA of cones with cast shadows outperforms all of the other methods except 
distance to cones with cast shadows. The small degradation in error rates is 
offset by the considerable speed up of more than one order of magnitude. 

— For very extreme illumination (Subset 5), the Correlation and Eigenface 
methods completely break down, and exhibit results that are slightly bet- 
ter than chance (90% error rate). The cone method performs significantly 
better, but certainly not well enough to be usable in practice. At this point, 
more experimentation is required to determine if recognition rates can be 
improved by either using more sampled extreme rays or by improving the 
image formation model. 

6 Conclusions and Discussion 

In this chapter we have shown that the set of images of a convex object with a 
Lambertian reflectance function, under all possible lighting conditions at infinity, 
is a convex, polyhedral cone. Furthermore, we have shown that this cone can be 
learned from three properly chosen images and that the dimension of the cone 
equals the number of distinct surface normals. We have shown that for objects 
with an arbitrary reflectance function and a non-convex shape, the set of images 
is still a convex cone and that these results can be easily extended to color 
images. For non-convex Lambertian surfaces, three images is still sufficient for 
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Extrapolating from Subset 1 


Method 


Error Rate (%) 


Subset 2 
30° 


Subset 3 
45° 


Subset 4 
60° 


Subset 5 
75° 


Correlation 


2.2 


46.2 


74.7 


86.6 


Eigenface 


3.3 


48.5 


76.5 


86.6 


Eigenface 
w/o 1st 3 


0.0 


32.3 


60.0 


80.6 


Linear subspace 


0.0 


3.9 


22.4 


50.8 


Cones- attached 


0.0 


2.3 


17.1 


43.8 


Cones-cast (PCA) 


0.0 


1.5 


13.5 


39.8 


Cones-cast 


0.0 


0.0 


10.0 


37.3 



Fig. 12. Extrapolation: When each of the methods is trained on images with 
near frontal illumination (Subset 1), the graph and corresponding table show 
the relative performance under more extreme light source conditions. 



constructing the cone, and this is accomplished by first reconstructing the surface 
up to a shadow-preserving generalized bas relief transformation. We have applied 
these results to develop a face recognition technique based on computing distance 
to cone, and have demonstrated that the method is superior to methods which do 
not model illumination effects, particularly the role of shadowing. Nevertheless, 
there remain a number of extensions and open issues which we discuss below. 
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6.1 Interreflection 

A surface is not just illuminated by the light sources but also through inter- 
reflections from points on the surface itself [1, 13]. For a Lambertian surface, 
the image with interreflection x' is related to the image that would be formed 
without interreflection x by 



x' = (I -RK)-^y. 

where I is the identity matrix, i? is a diagonal matrix whose diagonal elements 
denote the albedo of facet i, and K is known as the interreflection kernel [28]. 
When there is no shadowing, all images lie in a 3-D linear space that would 
be generated from Eq. 1 by a pseudo-surface whose normals and albedo B' are 
given by B' ~ {I — RK)^^B [28, 29]. From Proposition 4, the set of all possible 
images is still a cone. While B' can be learned from only three images, the set 
of shadowing configurations and the partitioning of the illumination sphere is 
generated from B, not B' . So, it remains an open question how the cone can be 
constructed from only three images. 



6.2 Effects of Change in Pose 

All of the previous analysis in the chapter has dealt solely with variation in 
illumination. Yet, a change in the object’s pose creates a change in the per- 
ceived image. If an object undergoes a rotation or translation, how does the 
illumination cone deform? The illumination cone of the object in the new pose 
is also convex, but almost certainly different from the illumination cone of the 
object in the old pose. Which raises the question: Is there a simple transforma- 
tion, obtainable from a small number of images of the object seen from different 
views, which when applied to the illumination cone characterizes these changes? 
Alternatively, is it practical to simply sample the pose space constructing an 
illumination cone for each pose? Nayar and Murase have extended their appear- 
ance manifold representation to model illumination variation for each pose as a 
3-D linear subspace [29]. However, their representation does not account for the 
complications produced by attached shadows. 



6.3 Object Recognition 

It is important to stress that the illumination cones are convex. If they are non- 
intersecting, then the cones are linearly separable. That is, they can be separated 
by n — 1 dimensional hyperplanes in M” passing through the origin. Furthermore 
since convex sets remain convex under linear projection, then for any projection 
direction lying in a separating hyperplane, the projected convex sets will also 
be linearly separable. For d different objects represented by d linearly separable 
convex cones, there always exists a linear projection of the image space to a d—1 
dimensional space such that all of the projected sets are again linearly separable. 
So, an alternative to classification based on measuring distance to the cones in 
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M” is to find a much lower dimensional space in which to do classification. In our 
Fisherface method for recognizing faces under variable illumination and facial 
expression, projection directions were chosen to maximize separability of the 
object classes [3]; a similar approach can be taken here. 

The face recognition experiment was limited to the available dataset from 
the Harvard Robotics Laboratory. To perform more extensive experimentation, 
we are constructing a geodesic lighting rig that supports 64 computer controlled 
xenon strobes. Using this rig, we will be able to modify the illumination at frame 
rates and gather an extensive image database covering a broader range of lighting 
conditions including multiple sources. Note that the images in the Harvard face 
database were obtained with a single source, and so all of the images in the 
test set were on or near the boundary of the cone. Images formed with multiple 
light sources may lie in the interior, and we have not tested these methods with 
multiple light sources. Our new database will permit such experimentation. 
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Abstract, n n o rv rom fix v wpo nt th to h ow 

urv n n m g h ng po nt 1 ght our (n r y or t nfin ty) 

um r nt lo t on . how th t or ny fin t to po nt 1 ght 

our Hum ntngnoj tvw un r thr orthogr ph or p r- 

p tv proj tonthr n uvln 1 oojthp hvng 

th m t o h ow . mroth uvln 1 ry 

our prmtr mlyo proj tv tr n orm t on n th h ow 
o tr n orm o j t r nt 1 wh n th m tr n orm t on 

ppl to th 1 ght our lo t on . n r orthogr ph proj t on th 
m ly th g n r 1 z -r 1 ( ) tr n orm t on n w how 

th t th tr n orm t on th only m ly o tr n orm t on o n 

o j t h p or wh h th ompl t t o m g h ow nt 1. 

urth rmor or o j t w th L m rt n nr Hum n t y 

t nt 1 ght our th uvln 1 oojthp wh h pr rv 

h ow 1 o pr rv nr h ng. n lly w how th t g v n mul- 
t pi mg un r r ng n unknown 1 ght our r t on t 
po 1 to r on tru t n o j t h p up to th tr n orm t on 
rom th h ow Ion . 



1 Introduction 

n his fi teenth entury Treatise on Painting 1 Leon r o in i errs in his 
n lysis o sh ows while omp ring p inting n relie s ulpture 

s r s light n sh e re on erne low relie ils oth s s ulpture 
n s p inting e use the sh ows orrespon to the low n ture o the 
relie s or ex mple in the sh ows o oreshortene o je ts whi h will 
not exhi it the epth o those in p inting or in s ulpture in the roun 

t is true th t when illumin te y the s me light sour e relie sur e 
n sur e in the roun ” will st ifferent sh ows owever Leon r o’s 
st tement ppe rs to overlook the t th t or ny 11 ttening o the sur e 
relie there is orrespon ing h nge in the light sour e ire tion su h th t the 
sh ows ppe r the s me his is not restri te to 1 ssi 1 relie s ut s we 
will 1 ter show pplies e u lly to gre ter set o proje tive tr ns orm tions 
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r g n 1 r n orm r g n 1 (2) r n orm (2) 




Fig. 1. n illustr lion o the effe t o pplying gener li e perspe tive s- 
relie ( ) tr ns orm tion to s ene ompose o te pot resting on 

supporting pi ne he first im ge shows the origin 1 te pot he se on im- 
ge shows the te pot ter h ving un ergone tr ns orm tion with 

( 1234 ) (0 0 0 1)) with respe t to the viewpoint use to gen- 
er te the first im ge ( he tr ns orm tion is efine in 2 ) Note th t 

the tt he n st sh ows s well s the o lu ing ontour re i enti 1 
in first two im ges he thir im ge shows the origin 1 te pot rom se on 
viewpoint he ourth im ge reve Is the n ture o the tr ns orm tion 

showing the tr ns orme te pot rom the s me viewpoint s use or the thir 
im ge 



ore spe ifi lly when no je t is viewe rom fixe viewpoint there is 
our p r meter mily o proje tive tr ns orm tions o the o je t’s stru ture n 
the light sour e lo tions su h th t the im ges o the sh ows rem in the s me 
his mily o proje tive tr ns orm tions is su h th t it restri ts sur e points 
to move long the lines o sight i e it fixes the lines p ssing through the o 1 

point urthermore i the sur e h s L m erti n refle t n e 19 12 n is 

viewe orthogr phi lly then or ny o the ove mentione tr ns orm tions o 

the sur e there is orrespon ing tr ns orm tion o the sur e 1 e o su h 
th t the sur e sh ing rem ins oust nt 

t ollows th t when light sour e positions re unknown neither sh ows nor 
sh ing ( or orthogr phi lly viewe o je ts with L m erti n refle nt n e) re- 
ve 1 the o je t’s u li e n stru ture et in 11 p st work on re onstru tion rom 
sh ows 11 17 31 6 13 14 21 sh pe rom sh ing 12 23 n photometri 
stereo 12 27 30 it is expli itly ssume th t the ire tion or lo tion o the 
light sour e is known 

n e tion 2 we expl in the et ils o the sh owing m iguity e show 
th t seen rom fixe viewpoint un er perspe tive proje tion two sur es pro- 
u e the s me sh ows i they iffer y p rti ul r proje tive tr ns orm tion 
whi h we 11 the ener li e erspe tive s- elie ( ) tr ns orm tion 

ee igure 1 or n ex mple o this tr ns orm tion his result hoi s or ny 
num er o proxim 1 or ist nt point light sour es urthermore un er on itions 
where perspe tive proje tion n e pproxim te y orthogr phi proje tion 
this tr ns orm tion is the ener li e s- elie ( ) tr ns orm tion 3 
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n e tion 3 we expl in the et ils o the sh ing m iguity e show th t 

seen rom fixe viewpoint un er orthogr phi proje tion n illumin te y 

light sour es t infinity two sur es pro u e the s me sh ing i they iffcr y 

tr ns orm tion s with the result on sh ows this result hoi s or ny 

num er o point light sour es 

n e tion 4 we show th t the tr ns orm tion is uni ue in th t ny 

two smooth sur es whi h pro u e the s me sh ows must iffer y 
( he result is evelope only or sur es whi h re onvex in sh pe ) 

in lly in e tion we propose n Igorithm or re onstru ting rom the 

tt he sh ow oun ries the stru ture o n o je t up to tr ns or- 

m tion he Igorithm ssumes th t the o je t is viewe orthogr phi lly n 
th t it is illumin te y set o point light sour es t infinity e o not propose 
this Igorithm with the elie th t its present orm h s gre t ppli ility ut 
r ther we give it to emonstr te th t un er i e 1 on itions in orm tion rom 
sh ows lone is enough to etermine the stru ture o the o je t up to 
tr ns orm tion 

2 Shadowing Ambiguity 

Let us efine two o je ts s eing shadow equivalent i there exist two sets o 
point light sour es 5 n 5 su h th t or every light sour e in iS illumin ting 
one o je t there exists light sour e in 5 illumin ting the se on o je t su h 

th t the sh owing in oth im ges is i enti 1 Let us urther efine two o je ts 

s eing strongly shadow equivalent i or any light sour e illumin ting one o - 
je t there exists sour e illumin ting the se on o je t su h th t sh owing 
is i enti 1 i e 5 is the set o 11 point light sour es n this se tion we will 

show th t two o je ts re sh ow e uiv lent i they iffer y p rti ul r set o 

proje tive tr ns orm tions 

onsi er mer - entere oor in te system whose origin is t the o 1 
point whose - n - xes sp n the im ge pi ne n whose - xis points in 

the ire tion o the opti 1 xis Let smooth sur e e efine with respe t 
to this oor in te system n lie in the h 1 sp e 0 in e the sur e is 

smooth the sur e norm 1 n(p) is efine t 11 points p 

e mo el illumin tion s olle tion o point light sour es lo te ne r y 
or t infinity Note th t this is restri tion o the lighting mo el presente y 
L nger n u ker 20 whi h permits nisotropi light sour es whose intensity is 
un tion o ire tion n this p per we will represent sur es light sour es n 
the mer enter s lying in either two or three imension 1 re 1 proje tive 
sp e ( ^ or ^) ( or on ise tre tment o re 1 proje tive sp es see 

22 ) his Hows unifie tre tment o oth point light sour es th t re ne r y 
(proxim 1) or ist nt ( t infinity) n mer mo els th t use perspe tive or 
orthogr phi proje tion 

hen point light sour e is proxim 1 its oor in tes n e expresse s 
s (sx Sy Sz)'^ n proje tive (homogeneous) oor in tes the light sour e s 

^ n e written s s {sx Sy Sz 1)^ (Note th t ifferent onts re use to 




h ow h ng n roj tv m gu ty 



135 



istinguish etween u li e n n proje live oor in tes ) hen point light 
sour e is t infinity 11 light r ys re p r llel n so one is on erne with the 

ire tion o the light sour e he ire tion n e represente s unit ve tor 

in ^ or s point on n illumin tion sphere s — ^ n proje tive oor in tes 
the ourth homogeneous oor in te o point t infinity is ero n so the light 
sour e n e expresse s s (sa; Sy Sz 0)^ (Note th t when the light sour e 
t infinity is represente in proje tive oor in tes the ntipo 1 points rom ^ 
must e e u te ) 

or single point sour e s — ^ let us efine the set o light rays s the 

lines in ^ p ssing through s or ny p — ^ with p — s there is single 

light r y p ssing through p N tur lly it is the interse tion o the light r ys with 
the sur e whi h etermine the sh ows e ifferenti te etween two types 

0 sh ows attached shadows n cast shadows 2 26 ee igures 2 n 3 

sur e point p lies on the or er o n attached shadow or light sour e s 

1 n only i it s tisfies oth lo 1 n glo 1 on ition 

Local Attached Shadow Condition: he light r y through p lies 

in the t ngent pi ne to the sur e t p Ige r i lly this on ition 

n e expresse s n(p) -(p — s) 0 or ne r y light sour e (here p 

n s enote u li e n oor in tes) n s n(p) -s 0 or ist nt 

light sour e (here s enotes the ire tion o the light sour e) point 
p whi h s tisfies t le st the lo 1 on ition is lie local attached 
shadow boundary point 

Global Attached Shadow Condition: he light r y oes not inter- 

se t the sur e etween p n s i e the light sour e is not o lu e 

t p 

Now onsi er pplying n r itr ry proje tive tr ns orm tion ^ — 

^ to oth the sur e n the light sour e n er this tr ns orm tion let 
P (P) n s (s) 

Lemma 1. A point p on a smooth surface is a local attached shadow boundary 
point for point light source s iff p on a transformed surface is a local attached 
shadow boundary point for point light source s . 

Proof, t lo 1 tt he sh ow oun ry point p the line efine y P — 

^ n light sour e s — ^ lies in the t ngent pi ne t p in e the or er 

o ont t (e g t ngen y) o urve n sur e is preserve un er proje tive 

tr ns orm tions the line efine y p ns lies in the t ngent pi ne t p 

Cast shadows o ur t points on the sur e th t e the light sour e ut 

where some other portion o the sur e lies etween the sh owe points n 
the light sour e point p lies on the oun ry o st sh ow or light sour e 

s i n only i it simil rly s tisfies oth lo 1 n glo 1 on ition 
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Local Cast Shadow Condition: he light r y through p gr es the 

sur e t some other point q (i e q lies on n tt he sh ow oun - 
ry) point p whi h s tisfies t le st the lo 1 on ition is lie 
local cast shadow boundary point 

Global Cast Shadow Condition: he only interse tion o the sur e 

n the light r y etween p n s is t q 



Lemma 2. A point p on a smooth surface is a local cast shadow boundary point 
for point light source s iff p on a transformed surface is a local cast shadow 
boundary point for point light source s . 

Proof, or lo 1 st sh ow oun ry point p — ^ n light sour e s — 

^ there exists nother point q — ^ on the line eline y p n s su h 

th t q lies on n tt he sh ow in e olline rity is preserve un er proje tive 
tr ns orm tions p q ns re olline r en e rom Lemm 1 q is Iso n 
tt he sh ow point 

ken together Lemm si n 2 in i te th t un er n r itr ry proje tive 
tr ns orm tion o sur e n light sour e the set o lo 1 sh ow urves is 

proje tive tr ns orm tion o the lo 1 sh ow urves o the origin 1 sur e n 

light sour e owever these two lemm s o not imply th t the two sur es re 

sh ow e uiv lent sin e the tr ns orme points m y proje t to ifferent im ge 

points or the glo 1 on itions m y not hoi 



2.1 Perspective Projection: GPBR 

e will urther restri t the set o proje tive tr ns orm tions o eling the m- 
er s un tion ^ ^ we re uire th t or ny point p on the 

sur e (p) ( (p)) where is proje tive tr ns orm tion th t is p n 

(p) must proje t to the same im ge point e will onsi er two spe ifi mer 
mo els in turn perspe tive proje tion p n orthogr phi proje tion o 

ithout loss o gener lity onsi er pinhole perspe tive mer with unit 
o 1 length lo te t the origin o the oor in te system n with the opti 1 
xis pointe in the ire tion o the - xis Letting the homogeneous oor in tes 
o n im ge point e given y u — ^ then pinhole perspe tive proje tion o 

p — ^ is given y u pp where 



10 0 0 
0 10 0 
00 10 



( 1 ) 



or p(p) p( (p)) to e true or ny point p the tr ns orm tion must 
move p long the opti 1 r y etween the mer enter n p his n e 

omplishe y the proje tive tr ns orm tion p — p where 
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Fig. 2 . n this 2- illustr tion o the gener li e perspe tive s-relie tr ns or- 
m tion ( ) the lower sh ow is n tt he sh ow while the upper one is 

onipose o oth tt he n st omponents tr ns orm tion h s 

een pplie to the le t sur e yiel ing the right one Note th t un er 
11 sur e points n the light sour e re tr ns orme long the opti 1 r ys 
through the enter o proje tion y tr ns orming the light sour e rom s to s 
the sh ows re preserve 



10 0 0 
0 10 0 
0 0 10 
12 3 4 



( 2 ) 



e 11 this tr ns orm tion the ener li e erspe tive s- elie ( ) 

tr ns orm tion n u li e n oor in tes the tr ns orme sur e n light 

sour e re given y 

1 1 

P ^ P s ^ s (3) 

a p +4 a — s -|- 4 

where a ( 123 )^ igure 2 shows 2- ex mple o eing pplie 

to pi n r urve n single light sour e he effe t is to move points on the 
sur e n the light sour es long lines through the mer enter in m nner 
th t preserves sh ows he sign o a ^ + 4 pi ys riti 1 role i it is positive 

11 points on move inw r or outw r rom the mer enter rem ining in 

the h 1 sp e 0 n the other h n i the sign is neg tive or some points 
on these points will move through the mer enter to points with 0 
i e they will not e visi le to the mer he e u tion a ^ + 4 0 efines 

pi ne whi h ivi es ^ into these two ses; 11 points on this pi ne m p to 

the pi ne t infinity simil r effe t on the tr ns orme light sour e lo tion is 
etermine y the sign o a -s + 4 



Proposition 1 . The image of the shadow curves for a surface and light source 
s is identical to the image of the shadow curves for a surface and light source 
s transformed by a GPBR z/ a -s + 4 0 and a ^ + 4 0 for all p — . 
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Proof, in e is proje live tr ns orm tion Lemm s 1 n 2 show th t 

the lo 1 tt he n st sh ow urves on the tr ns orme sur e rom 
light sour e s re tr ns orm tion o the lo 1 sh ow urves on rom 

light sour e s or ny point p on the sur e n ny tr ns orm tion 

we h ve pp p p n so the im ges o the lo 1 sh ow urves re i enti 1 
o show th t the glo 1 on ition or n tt he sh ow is Iso s tislie 

we note th t proje tive tr ns orm tions preserve olline rity; there ore the only 

interse tions o the line efine y s n p with re tr ns orm tions o 

the interse tions o the line efine y s n p with ithin e h light r y 

( proje tive line) the points re su je te to proje tive tr ns orm tion; in 
gener 1 the or er o the tr ns orme interse tion points on the line my e 
om in tion o y li permut tion n revers 1 o the or er o the origin 1 

points owever the restri tion th t a ^ + 4 0 or 11 p — n th t 

a -s + 4 Ohs the effe t o preserving the or er o points etween p n s 
on the origin 1 line n etween p n s on the tr ns orme line 

t shoul e note or th t or ny a n 4 there exists light sour e s 

suhthta-s+ 4 0 hen is illumin te y su h sour e the tr ns orme 

sour e p sses through the mer enter n the glo 1 sh owing on itions 
m y not e s tisfie en e two o je ts iffering y re not strongly 

sh ow e uiv lent n the other h n or ny oun e set o light sour es n 
oun e o je t there exists set o 1 4 su h th t a -s + 4 On 

a ^ + 4 0 en e there exist set o o je ts whi h re shadow equivalent. 

in e the sh ow urves o multiple light sour es re the union o the sh ow 

urves rom the in ivi u 1 light sour es this Iso hoi s or multiple light sour es 
t shoul Iso e note th t the o lu ing ontours (silhouette) on re 
i enti 1 sin e the mer enter is fixe point un er n the o lu ing 

ontour is the s me s the tt he sh ow urve pro u e y light sour e 

lo te t the mer enter 

igure 1 shows n ex mple o the tr ns orm tion eing pplie to 

s ene ont ining te pot resting on support pi ne he im ges were gener te 
using the r y tr ing p k ge the s ene ont ine single proxim 1 

point light sour e the sur es were mo ele s L m erti n n perspe tive 

mer mo el w s use hen the light sour e is tr ns orme with the sur e 

the sh ows re the s me or oth the origin 1 n tr ns orme s enes ven 

the sh ing is simil r in oth im ges so mu h so th t it is ne rly impossi le to 

istinguish the two sur es owever rom nother viewpoint the effe t o the 
tr ns orm tion on the o je t’s sh pe is pp rent 
his result ompliments p st work on stru ture rom motion in whi h the 
im o stru ture re overy is we ker non- u li e n represent tion su h s 
fhne 1 24 2 2 proje tive 9 or or in 1 10 



2.2 Orthographic Projection: GBR 

hen mer is ist nt n n e mo ele s orthogr phi proje tion the 
visu 1 r ys re 11 p r llel to the ire tion o the opti 1 xis n ^ these 
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Fig. 3. he im ge points th t lie in sh ow or sur e un er light sour e s 
re i enti 1 to those in sh ow or tr ns orme sur e un er light sour e s 

n this 2- illustr tion the lower sh ow is n tt he sh ow while the upper 
one is onipose o oth tt he n st omponents gener li e s-relie 
tr ns orm tion with oth fi ttening n n itive pi ne h s een pplie to 
the le t sur e yiel ing the right one 

r ys interse t t the mer enter whi h is point t infinity ithout loss 
o gener lity onsi er the viewing ire tion to e in the ire tion o the - xis 
n the - n - xes to sp n the im ge pi ne g in letting the homogeneous 
oor in tes o n im ge point e given y u — ^ orthogr phi proje tion o 

p — ^ n e expresse s u oP where 

'1 0 0 0 ' 

o 0 10 0 (4) 

0 0 0 1 

Now let us onsi er nother set o proje tive tr ns orm tions ^ — 

^ or o(p) o( (p)) to e true or ny point p the tr ns orm tion 

must move p long the viewing ire tion his n e omplishe y the 
proje tive tr ns orm tion p — p where 

'1 0 0 0 ' 

0 10 0 ^ ^ 

12 3 4 

0 0 0 1 

with 3 0 he m pping is n fHne tr ns orm tion whi h w s intro u e 

in 3 n w s lie the gener li e s-relie ( ) tr ns orm tion onsi er 

the effe t o pplying to sur e p r meteri e s the gr ph o epth 

un tion ( ( )) his yiel s tr ns orme sur e 



( 6 ) 
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1 0 



ee igure 3 or n ex mple he p r meter 3 h s the effe t o s ling the 

relie o the sur e 1 n 2 h r teri e n itive pi ne n 4 provi es 

cpth offset s es ri e in 3 when 1 2 0 n 0 3 1 the resulting 

tr ns orm tion is simply ompression o the sur e relie s in relie s ulpture 

Proposition 2. The image of the shadow curves for a surface and light source 
s are identical to the image of the shadow curves for a surface and light source 
s transformed by any GBR. 

Proof, he proo ollows th t o reposition 1 

t shoul e note th t reposition 2 pplies to oth ne r y light sour es n 
those t infinity owever in ontr st to the tr ns orm tion ne r y light 

sour e o not move to infinity nor o light sour es t infinity e ome ne r y light 

sour es sin e is n fline tr ns orm tion whi h fixes the pi ne t infinity 

in e reposition 2 hoi s or any light sour e 11 o je ts iffering y 
tr ns orm tion re strongly shadow equivalent 

n impli tion o repositions 1 n 2 is th t when no je t is o serve 
rom fixe viewpoint (whether perspe tive or orthogr phi proje tion) one n 
t est re onstru t its sur e up to our p r meter mily o tr ns orm tions 
( or ) rom sh ow or o lu ing ontour in orm tion irrespe tive 

o the num er o im ges n num er o light sour es n er the s me on i- 
tions it is impossi le to istinguish (re ogni e) two o je ts th t iffer y these 
tr ns orm tions rom sh ows or silhouettes 

3 Shading Ambiguity 

Let us efine two o je ts s eing strongly shading equivalent i or any light 
sour e illumin ting one o je t there exists sour e illumin ting the se on 
o je t su h th t sh ing is i enti 1 n this se tion we will show th t two o - 
je ts with sur es h ving L m erti n refle t n e 19 12 re strongly sh ing 
e uiv lent i they iffer y ny o the set o tr ns orm tions es ri e in 

the previous se tion ere we onsi er ist nt illumin tion (p r llel illumin ting 

r ys) o o je ts viewe un er orthogr phi proje tion (p r llel lines o sight) 

onsi er g in mer - entere oor in te system whose origin is t the 

o 1 point whose - n - xes sp n the im ge pi ne n whose - xis points 
in the ire tion o the opti 1 xis n this oor in te system the epth o every 

visi le point in the s ene n e expresse s 

( ) 

where is pie ewise ifferenti le un tion he gr ph ( ( )) efines 

sur e whi h will Iso e enote y he ire tion o the inw r pointing 

sur e norm 1 n( ) n e expresse s 



1 



n( ) 



y 



(7) 
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where a; n y enote the p rti 1 eriv lives o with respe t to n 
respe lively 

n e we restri t ourselves to orthogr phi proje tion we no longer nee the 
nil m hinery o proje live oor in tes visi le point p on the sur e h s 
u li e n oor in tes p ( ( ))^ s one in 6 we write the 

tr ns orm tion on sur e point p s p^ p + (0 0 4)^ where h s een 

rewritten s 

'10 0 ' 

0 10 ( ) 
12 3 



n er the m trix pro u t oper tion the set — — orms su group 

o ( 3 ) with 

3 0 O' 

0 3 0 

— 1 — 2 1 




Iso note th t or im ge point ( ) the rel tion etween the ire tion o the 

sur e norm 1 o ^ n is given y where ~ 

( ^ 1 )^ ( s shown in 3 this is the only line r tr ns orm tion o the sur e’s 
norm 1 fiel whi h preserves integr ility ) 

Letting the 1 e o o L m erti n sur e e enote y ( ) the 

intensity im ge pro u e y light sour e s n e expresse s 



If, a A ) ^fA )b^( )S 



where b( ) is the pro u t o the 1 e o ( ) o the sur e n the inw r 

pointing unit sur e norm 1 n( ); the ve tor s enotes point light sour e t 

infinity with the m gnitu e o s proportion 1 to the intensity o the light sour e; 

n ) is in ry un tion su h th t 

^ r 0 i ( ) is sh owe 

1 1 otherwise 

e now show th t sh ing on sur e or some light sour e s is i enti 1 

to th t on tr ns orme sur e ^ or light sour e s^when ^h s 1 e o 

( )^given y 



— \/( 3 a:- 1 z )^ + { 3 y - 2 (9) 

where i x y be effc t o pplying 9 to 1 ssi 1 s-relie 

tr ns orm tion 0 3 1 is to rken points on the sur e where n points 

w y rom the opti 1 xis 

his tr ns orm tion on the 1 e o is su tie n w rr nts is ussion or 3 

lose to unity the tr ns orm tion on the 1 e o is ne rly impossi le to ete t 

h t is i you tr ns orm the sh pe o sur e y tr ns orm tion ut 

le ve the 1 e o un h nge then the ifferen es in the im ges pro u e un er 
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V rying illumin tion re too sm 11 to reve 1 the ifferen e in the stru ture n 
ig 4 we le t the 1 e o un h nge ^ ) ( ) n even though 3 

r nges rom 0 to 1 the ifferen es in sh pe nnot not e is erne roni the 

ront 1 im ges owever when the 1 e o is un h nge n the fl ttening is more 
severe eg ten ol (3 0 1) the sh ing p tterns n reve Ithefl tness o the 

sur e his effe t is o ten seen on very low relie s ulptures (e g on tello’s 
rilievo schiacciato) whi h repro u e sh owing ur tely ut sh ing poorly 
Note th t or the sh owing to e i enti 1 it is ne ess ry th t 3 0 hen 

3 0 the sur e ^is inverte ( s in intaglio) or orrespon ing tr ns or- 

m tion o the light sour e the illumin te regions o the origin 1 sur e n 
the tr ns orme sur e ^will e the s me i the 1 e o is tr ns orme s e- 

s ri e ove ( his is the well known up/ own” ( onvex/ on ve) m iguity ) 

owever the sh ows st y ^ n my iffer uite r m ti lly 

Propositions. For each light source s illuminating a Lambertian surface ( ) 

with albedo ( ), there exists a light source illuminating a surface ) 

(a GBR transformation of ) with albedo ) (is given in Eg. 9 ), such that 

) df'a'.s'i )■ 

Proof. he im ge o is given y 

If, a A ) 'I'Lsi )b^( )s 

or ny 3 — 3 inverti le m trix we h ve th t 

//.a,s( ) )b^( ) S 

in e is su group o ( 3 ) n ) I'f'.s'i ) 

If,aA ) I'fA )b^( ) S 

)b^( )s- 

If',a',s'{ ) 

where ) ^^b( ) n s 

en e two o je ts with L m erti n sur es iffering in sh pe y tr ns- 

orm tion n iffering in 1 e o y 9 re in ee strongly sh ing e uiv lent 
he three pre e ing propositions h ve shown th t when L m erti n sur e 
with 1 e o ( ) is illumin te y single light sour e the set o im ges it 

n pro u e y v rying the light sour e strength n ire tion re e uiv lent to 
those pro u e y tr ns orme sur e ^with 1 e o ) given y 

9 et ue to the superposition o im ges this result hoi s not simply or 

im ges pro u e y single point light sour e ut Iso or im ges pro u e y 

ny possi ly infinite om in tion o point light sour es 

hese results emonstr te th t when oth the sur e n light sour e i- 
re tions re tr ns orme y oth the sh owing n sh ing re i enti 1 

in the im ges o the origin 1 n tr ns orme sur e n impli tion o this 

result is th t given ny num er o im ges t ken rom fixe viewpoint neither 
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Fig. 4. hr - m n on 1 t or hum nh wotn ungl r n(y- 

rw V ) n r n r (top row) L m rt n ur w th on t nt 1 o ( u 1 
gr y V lu or 11 ur po nt ). h u u nt thr row how mg o th h 
who h p h n tr n orm y r nt g n r 1 z -r 1 tr n orm t on 

ut who 1 o h not n tr n orm . h profil v w o th n th th r 

olumn r V 1 th n tur o th n v u 1 tr n orm t on n th r t on o th 

1 ght our . h top row th tru h p ; th on rom top fl tt n h p 
(ga 0.5) ( r 1 1 -r 1 ); th thr n long t h p (ga 1.5); 

n th ottom fl tt n h p plu n t v pi n (ga 0.7 Q 2 0.5 n 
gi 0.0). h fir t olumn how ront 1 v w o th n th th r olumn. 

rom th V w th tru 3- tru tur o th o j t nnot t rm n ; n h 
m g th h ow ng p tt rn r nt In v n though th 1 oh not n 

tr n orm or ng to . 9 th h ng p tt rn r o lo to prov w 

u to th tru tru tur . h on olumn how n r ront 1 v w o th 

n th fir t olumn trhvng n prtly rot t to omp n t or th gr 

o fl tt n ng or long t on. Not th t v n m 11 rot t on pp r not to r v 1 th 3- 

tru tur . 
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omputer vision Igorithm nor iologi 1 pro ess n istinguish two o je ts 
th t iffer y tr ns orm tion Knowle ge (or ssumptions) out sur e 

sh pe sur e 1 e o light sour e ire tion or light sour e intensity must e 

employe to resolve this m iguity ee g in ig 4 

4 Uniqueness of the Generalized Bas-Relief 
Transformation 

ere we prove th t un er orthogr phi proje tion the gener li e s-relie 
( ) tr ns orm tion is uni ue in th t there is no other tr ns orm tion o n 

o je t’s sur e whi h preserves the set o sh ows pro u e y illumin ting the 

o je t with 11 possi le point sour es t infinity e onsi er only the simplest 
se n o je t with onvex sh pe sting no sh ows on its own sur e 

n show th t the set o tt he sh ow oun ries re preserve only un er 

tr ns orm tion o the o je t’s sur e 

e 11 th t n tt he sh ow oun ry is efine s the ontour o points 

( ( ))^ s tis ying n -s 0 or some s or onvex o je t the glo 1 

tt he sh ow on ition hoi s everywhere ere the m gnitu e n the sign 
o the light sour e re unimport nt s neither effe ts the lo tion o the t- 

t he sh ow oun ry hus let the ve tor s (sx Sy SzY' enote in ho- 

mogeneous oor in tes point light sour e t infinity where 11 light sour es 
pro u ing the s me tt he sh ow oun ry re e u te i e (sx Sy SzY ~ 

( Sa; Sy SzY ~ ~ ^0 ith tliis the sp e o light sour e ire tions S 

is e uiv lent to the re 1 proje tive pi ne ( with the line t infinity given 

y oor in tes o the orm (sa; Sy 0)^ Note th t in e tion 2 we represente 

light sour es s points in here we restri t ourselves only to ist nt light 

sour es lying in the pi ne t infinity o ^ ( re 1 proje tive pi ne) 

Let n { X y zY enote the ire tion o sur e norm 1 g in the 
m gnitu e n sign re unimport nt soweh ve( a; y zY ~ { x y zY 
— — — 0 hus the sp e o sur e norm Is M is likewise e uiv lent to 

^ Note th t un er the e u tion n -s 0 the sur e norm Is re the u 1 
o the light sour es h point in the ^ o light sour es h s orrespon ing 
line in the ^ o sur e norm Is n vi e vers 

Let us now onsi er the im ge ontours efine y the points ( ) s tis ying 

n -s 0 or some s hese im ge ontours re the tt he sh ow oun ries 

orthogr phi lly proje te onto the im ge pi ne or 1 k o etter n me we 
will re er to them s the im ge tt he sh ow oun ries 

he set o im ge tt he sh ow oun ries or onvex o je t orms n 

str t proje tive pi ne ^ where point” in the str t proje tive pi ne 

is single tt he sh ow oun ry n line” in the str t proje tive 

pi ne is the olle tion o im ge tt he sh ow oun ries p ssing through 

ommon point in the im ge pi ne o see this note the o vious proje tive 

isomorphism etween the re 1 proje tive pi ne o light sour e ire tions S n 
the str t proje tive pi ne o im ge tt he sh ow oun ries ^ n er 

this isomorphism we h ve ije tions m pping points to points n lines to lines 
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Fig. 5. he rel tion o ifferent sp es in proo o reposition 4 

Now let us s y th t we re given two o je ts whose visi le sur es re 

es ri e y respe tive un tions ( ) n ^ ) the o je ts h ve the 

s me set o im ge tt he sh ow oun ries s seen in the im ge pi ne (i e 

i the o je ts re strongly sh ow e uiv lent) then the uestion rises ow re 

the two sur es ( ) n ^ ) rel te ? 

Proposition 4. If the visible surfaces of two convex objects and are strongly 
shadow equivalent, then the surfaces are related by a generalized bas-relief trans- 
formation. 

Proof. s illustr te in igure we n onstru t proje tive isomorphism e- 

tween the set o im ge tt he sh ow oun ries ^ n the re 1 proje tive 

pi ne o light sour e ire tions S illumin ting sur e ( ) he isomorphism 

is hosen to m p the olle tion o im ge tt he sh ow oun ries p ssing 
through ommon point ( ) in the im ge pi ne (i e line in to the 

sur e norm 1 n( ) n the s me m nner we n onstru t proje tive iso- 
morphism etween ^ n the re 1 proje tive pi ne o light sour e ire tions S 

illumin ting the sur e ( ) he isomorphism is likewise hosen to m p the 

s me olle tion o im ge tt he sh ow oun ries p ssing through ( ) in 

the im ge pi ne to the sur e norm 1 n ( ) n er these two m ppings we 

h ve proje tive isomorphism etween 5 n iS whi h in turn is proje tive 
tr ns orm tion ( olline tion) 1 e use J\f n J\f re the u Is o 5 n S 

respe tively the sur e norm Is o ( ) re Iso rel te to the sur e norm Is 

o ( ) y proje tive tr ns orm tion i e n ( ) n( ) where is 

3 3 inverti le m trix 
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he tr ns orm tion is urther restri te in th t the sur e norm Is long 
the o lu ing ontour o n re e uiv lent i e the tr ns orm tion 

pointwise fixes the line t infinity o sur e norm Is hus must e o the 
orm 

■fo r 
01 2 
0 0 3 . 

where 3 — 0 he effe t o pplying to the sur e norm Is is the s me s 
pplying in to the sur e i 1 - 1 3 2 - 2 3 n 3 1 3 

h t is h s the orm o the gener li e s-relie tr ns orm tion Note th t 

the sh ows re in epen ent o the tr nsl tion 4 long the line o sight un er 
orthogr phi proje tion 



5 Reconstruction from Attached Shadows 

n the previous se tion we showe th t un er orthogr phi proje tion with 
ist nt light sour es the only tr ns orm tion o sur e whi h preserves the set 
o im ge sh ow ontours is the gener li e s-relie tr ns orm tion owever 
roposition 4 oes not provi e pres ription or tu lly re onstru ting sur e 
up to n this se tion we onsi er the pro lem o re onstru tion rom the 

tt he sh ow oun ries me sure in im ges o sur e e h illumin te 

y single ist nt light sour e e will show th t it is possi le to estim te the 

light sour e ire tions n the sur e norm Is t finite num er o points 11 

up to n gener 1 we expe t to re onstru t the sur e norm Is t ( ^) 

points rom the re onstru te norm Is n pproxim tion to the un erlying 

sur e n e ompute or fixe Item tively existing sh pe- rom- 

sh ow metho s n e use to re onstru t the sur e rom the estim te light 

sour e ire tions ( or fixe ) n rom the me sure tt he n st 

sh ow urves 11 17 31 

irst onsi er the o lu ing ontour (silhouette) o sur e whi h will e 
enote 0 his ontour is e uiv lent to the tt he sh ow pro u e y 

light sour e whose ire tion is the viewing ire tion efine oor in te system 
with X n y sp nning the im ge pi ne n with z pointing in the viewing 
ire tion or 11 points p on the o lu ing ontour the viewing ire tion lies 

in the t ngent pi ne (i e n(p) -z 0) n the sur e norm 1 n(p) is p r llel 

to the im ge norm 1 en e i the norm 1 to the im ge ontour is ( x yY' the 

sur e norm 1 is n { x y 0)^ n ^ the sur e norm Is to 11 points on 

the o lu ing ontour orrespon to the line t infinity 

Now onsi er the tt he sh ow oun ry 1 pro u e y light sour e 

whose ire tion is si ee igure 6 or 11 points p — 1 si lies in the t ngent 

pi ne i e si m(p) 0 here 1 interse ts the o lu ing ontour the norm 1 

ni n e ire tly etermine rom the me sure ontour s es ri e ove t 
shoul e note th t while 1 n the o lu ing ontour interse t tr nsvers lly 
on the sur e their im ges generi lly sh re ommon t ngent n orm the 
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Fig. 6. e onstru tion up to rom tt he sh ows or single o je t 

in fixe pose these figures show superimpose tt he sh ow ontours i or 

light sour e ire tions he sur e norm 1 where i interse ts the o lu ing 

ontour is enote y he norm 1 t the interse tion o i n j is enote 

y THij ) he three ontours interse t t three points in the im ge ) he three 
ontours meet t ommon point implying th t si S2 n S3 lie on gre t ir le 
o the illumin tion sphere ) ight tt he sh ow oun ries o whi h our 

interse t t pi ,2 n our interse t t pi,s; the ire tion o the light sour es 

Si sg n the sur e norm Is t the interse tion points n e etermine 
up to ) he stru ture o the illumin tion sphere ^ or the light sour e 

ire tions gener ting the tt he sh ow oun ries in ig 6 



res ent moon im ge singul rity Note th t y me suring ni long the o - 
lu ing ontour we o t in onstr int on the light sour e ire tion si 0 

his restri ts the light sour e to line in ^ or to gre t ir le on the illumi- 
n tion sphere ^ he sour e Si n e expresse p r metri lly in the mer 
oor in te system s 



si( 1) os i(ni-z)+sin iz 

rom the sh ows in single im ge it is not possi le to urther onstr in si nor 

oes it seem possi le to o t in ny urther in orm tion out points on 1 

Now onsi er se on tt he sh ow oun ry 2 orme y se on 

light sour e ire tion S2 g in the me surement o n2 (where 2 interse ts 0) 
etermines proje tivelinein ^ (or gre t ir le on th t the light sour e 
S2 must lie on n gener 1 1 n 2 will interse t t one or more visi le sur e 

points the o je t is onvex n the uss m p is ije tive then they only 
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interse t t one point pi^2 or non- onvex sur e in 2 m y interse t 

more th n on e owever in 11 ses the ire tion o the sur e norm 1 ni^2 
t the interse tions is 

ni,2 Si(i)-S2(2) (10) 

hus rom the tt he sh ows in two im ges we ire tly me sure ni n U2 

n o t in estim tes or ni^2 si n S2 s un tions o i n 2 

onsi er thir im ge illumin te y S3 in whi h the tt he sh ow 
oun ry 3 does not p ss through pi^2 ( ig 6 ) g in we n estim te 

proje tive line (gre t ir le on ont ining S3 e Iso o t in the sur e 

norm 1 t two ition 1 points the interse tions o 3 with in 2 rom 

the tt he sh ow oun ries or onvex sur e me sure in im ges 

i no three ontours interse t t ommon point the sur e norm 1 n e 

etermine t ( — 1) points s un tion o unknowns i i 1 

owever the num er o unknowns n e re u e when three ontours inter- 
se t t ommon point onsi er ig 6 where ontour 4 interse ts 1 n 

2 t pi^2 n this se we n in er rom the im ges th t si S2 n S4 11 lie in 

the t ngent pi ne to pi^2 n ^ this rne ns th t Si S2 S4 11 lie on the s me 
proje tive line in e n4 n e me sure S4 n e expresse s un tion o 

in 2 ie 



S4( 12) H4 - (si( 1) - S2( 2)) 

hus set o tt he sh ow urves (1 2 4 in ig 6 ) p ssing through 
ommon point (pi,2) is gener te y light sour es (si S2 S4 in ig 6 ) lo- 
te on gre t ir le o ^ he light sour e ire tions n e etermine up 
to two egrees o ree om 1 n 2 Now i in ition se on set o light 
sour es lies long nother proje tive line (the gre t ir le in ig 6 ont ining 

Si S3 S6 S7) the orrespon ing sh ow ontours (1 3 e 7 in ig 6 ) in- 

terse t t nother point on the sur e (pi,3) g in we n express the lo tion 
o light sour es (se S7) on this gre t ir le s un tions o the lo tions o two 
other sour es (si n S3) 



Sii 13) Hi - (Si( 1) - S 3 ( 3)) 

in e Si lies t the interse tion o oth proje tive lines we n estim te the 
ire tion o ny light sour e lo te on either line up to just three egrees o 

ree om 1 2 n 3 urthermore the ire tion o ny other light sour e (sg 

on ig 6 ) n e etermine i it lies on proje tive line efine y two light 
sour es whose ire tions re known up to 1 2 n 3 rom the estim te 
light sour e ire tions the sur e norm 1 n e etermine using 10 t 11 
points where the sh ow urves interse t s mentione e rlier there re ( ^) 
su h points o serve the num er o interse tions in ig 6 t is e sy to veri y 

Ige r i lly th t the three egrees o ree om 1 2 n 3 orrespon to the 

egrees o ree om in 1 2 n 3 he tr nsl tion 4 o the sur e long 

the line o sight nnot e etermine un er orthogr phi proje tion 
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6 Discussion 

e h ve efine notions o sh ow e uiv len e or o je t showing th t two o - 
je ts iffering y our p r meter mily o proje tive tr ns orm tions ( ) 

re sh ow e uiv lent un er perspe tive proje tion urthermore un er ortho- 
gr phi proje tion two o je ts iffering y gener li e s-relie ( ) tr ns- 

orm tion re strongly sh ow e uiv lent i e or ny light sour e illumin ting 
n o je t there exists light sour e illumin ting tr ns orme o je t su h th t 
the sh ows re i enti 1 e h ve proven th t is the only tr ns orm tion 

h ving this property hile we h ve shown th t the o lu ing ontour is Iso 

preserve un er n it shoul e note th t im ge intensity is- 

ontinuities (step e ges) rising rom sur e norm 1 is ontinuities or 1 e o 
is ontinuities re Iso preserve un er these tr ns orm tions sin e these points 
move long the line o sight n re viewpoint n (generi lly) illumin tion in- 
epen ent onse uently e ge- se re ognition Igorithms shoul not e le 
to istinguish o je ts iffering y these tr ns orm tions nor shoul e ge- se 
re onstru tion Igorithms e le to per orm u li e n re onstru tion without 
ition 1 in orm tion 

n e rlier work where we on entr te on light sour es t infinity 4 3 we 

showe th t or ny set o point light sour es the sh ing s well s the sh - 

owing o n o je t with L m erti n refle t n e re i enti 1 to the sh ing n 

sh owing o ny gener li e s-relie tr ns orm tion o the o je t i e the 
illumin tion ones 4 re i enti 1 his is onsistent with the effe tiveness o 
well- r te relie s ulptures in onveying gre ter sense o the epth th n is 
present t is le r th t sh ing is not preserve or or or when the 

light sour es re proxim 1; the im ge intensity 11s off y the re ipro 1 o the 
s u re ist n e etween the sur e n light sour e n ist n e is not pre- 
serve un er these tr ns orm tions Nonetheless or r nge o tr ns orm tions 

n or some sets o light sour es it is expe te th t the intensity m y only v ry 
slightly 

urthermore we h ve shown th t it is possi le to re onstru t sur e up 
to rom the sh ow oun ries in set o im ges o implement re- 

onstru tion Igorithm se on the i e s in e tion re uires ete tion o 

st n tt he sh ow oun ries hile ete tion metho s h ve een pre- 
sente 29 it is un le r how effe tive these te hni ues woul e in pr ti e 

n p rti ul r tt he sh ows re p rti ul rly ifh ult to ete t n lo li e 

sin e or L m erti n sur e with oust nt 1 e o there is is ontinuity in 

the intensity gr ient or sh ing flow fiel ut not in the intensity itsel n the 

other h n there is step e ge t st sh ow oun ry n so extensions 
o the metho es ri e in e tion whi h use in orm tion out st sh ows 
to onstr in the light sour e ire tion m y le to pr ti 1 implement tions 
Leon r o in i’s st tement th t sh ows o relie s ulpture re oreshort- 

ene ” is stri tly spe king in orre t owever relie s re o ten onstru te in 

m nner su h th t the st sh ows will iffer rom those pro u e y s ulpture 
in the roun elie s h ve een use to epi t n rr tives involving numerous 

figures lo te t ifferent epths within the s ene in e the s ulpting me ium 
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is usu lly not thi k enough or the rtist to s ulpt the figures to the proper rel - 

tive epths s ulptors like on tello n hi erti employe rules o perspe tive 

to etermine the si e n lo tion o figures s ulpting e h figure to the proper 
relie 16 hile the sh owing or e h figure is sel onsistent the sh ows 
st rom one figure onto nother re in orre t urthermore the sh ows st 

onto the kgroun whose orient tion usu lly oes not orrespon to th t o 

w 11 or fioor in the s ene re Iso in onsistent Note however th t n ient 
reek s ulpture w s o ten p inte ; y p inting the kgroun o the rthenon 

rie e rk lue 7 st sh ows woul e less visi le n the istortions less 
pp rent hus Leon r o’s st tement is n ur te h r teri tion o omplex 

relie s su h s hi erti’s st oors on the ptistery in loren e ut oes not 

pply to figures s ulpte singly 
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Abstract, n h p p w u y low Ivl g gn onnh 

no Iz u f wo k p opo y h n M 1 k (199 ). h go 1 
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LtG V,E w ight un i t g ph wh V th no n 

E th g . L t S p tition o th g ph A B V,A B 
n g ph th o ti 1 ngu g th imil ity tw n th two g oup i 11 

th cut 



cuL{A, B) 



w{u, v) 



E 

u A,v B 

wh w{u, u) i th w ight on th g tw n no u n 

p opo to u normalized imil ity it ion to v lu t 

11 it th normalized cut , . 

cut{A,B) cut{B,A) 



Ncut{A, B) 



asso{A^V) asso{B,V) 



u. hi n M lik 
p tition. h y 



wh asso{A, V) Ylu A t V i ^h tot 1 onn tion om no in A 

to 11 th no in th g ph. o mo i u ion on thi it ion pi 

to 11 . 

On k y V nt g o u ing th no m liz ut i th t goo pp o im tion 
to th optim 1 p tition n omput v y i ntly. ^ Ij i W th 

o i tion m t i i. . Wij i th w ight tw n no i n j in th g ph. L t 
D th i gon 1 m t i u h th t Du i th um o th 

w ight o 11 th onn tion to no i. hi n M lik how th t th optim 1 
p tition n oun y omputing 



V 



g min Ncut 



gmin 

y 



y^{D-W)y 

y^Dy 



( 1 ) 



wh y -a, ^ i in y in i to v to pi ying th g oup i ntity 

o h pi 1 i. . ?/i a i pi 1 i long to g oup A n yj 6 i pi 1 j 

long to B. N i th num o pi 1 . oti th t th ov p ion i th 
yl igh uoti nt. w 1 y to t k on 1 v lu (in t o two i t 
V lu ) w n optimiz u tion 1 y olving g n liz ig nv lu y t m. 

i nt Igo ithm with polynomi 1 unning tim w 11- known o olving 
uhpolm. h o w n omput n pp o im tion to th optim 1 

p tition V y i ntly. o t il o th iv tion o u tion 1 pi 

to 11 . 



3 The Mass- Spring Analogy 

w h V ju t n th o m liz ut Igo ithm ui th olution o 
g n liz ig n y t m involving th w ight j n y m t i . n thi tion 
w V lop th intuition hin thi p o y on i ing phy i 1 int p - 
t tion oth ignytm m -p ing y t m. 

n ng h u op Ip on n o pi p o 1 . 




1 



n M Ik 
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4 Local Image Features 
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4.1 Brightness, Color, and Texture 
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4.2 Contour Continuity 
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4.4 Results 
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5 Discussion 
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Abstract. The objective of this work is the automatic detection and 
grouping of imaged elements which repeat on a plane in a scene (for ex- 
ample tiled floorings) . It is shown that structures that repeat on a scene 
plane are related by particular parametrized transformations in perspec- 
tive images. These image transformations provide powerful grouping con- 
straints, and can be used at the heart of hypothesize and verify grouping 
algorithms. The parametrized transformations are global across the im- 
age plane and may be computed without knowledge of the pose of the 
plane or camera calibration. 

Parametrized transformations are given for several classes of repeating 
operation in the world as well as groupers based on these. These groupers 
are demonstrated on a number of real images, where both the elements 
and the grouping are determined automatically. 

It is shown that the repeating element can be learnt from the image, and 
hence provides an image descriptor. Also, information on the plane pose, 
such as its vanishing line, can be recovered from the grouping. 



1 Introduction 

Grouping is one of the most fundamental objectives of Computer Vision and 
pervades most of the disparate sub-disciplines; for example object recognition 
always involves perceptual organization (or figure/ground separation) to some 
extent; shape-from-texture involves grouping texels; boundary detection involves 
grouping edgels (e.g. saliency of curves); motion segmentation involves grouping 
independently moving objects over multiple frames etc. 

In this paper we investigate the grouping of repeated structures. The mo- 
tivation for this are three-fold: first, repetitions are common in the world — 
examples include parquet floor tilings, windows, bricks, patterns on fabrics, wall- 
paper; second, the groupings provide a compact image descriptor, essentially a 
‘high level’ feature, which may be used for image matching — for example in 
image database retrieval, model based recognition, and stereo correspondence; 
third, the retrieved repeating operation can provide shape and pose information 
— for example the vanishing line of a plane — in a similar manner to that of 
shape-from-texture. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 165-181, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




166 



Frederik Schaffalitzky and Andrew Zisserman 



To be specific the objective is quite simply stated: suppose a structure is 
repeated in the world a number of times by some operation (for example a 
translation); then identify this structure and all its repetitions from a perspec- 
tive image. The outcome is the imaged element, and a grouping over the imaged 
repetitions. This simple statement does belie the actual difficulty of a computa- 
tional procedure since a priori the element is unknown — in fact the element 
only ’exists’ because it is repeated by the (unknown) image operation. 

For the body of the paper we specialize the operation to repetitions on a 
plane, and return to a more general setting in section 4. In particular it is shown 
in section 2 that the operation of repeating by a translation on a scene plane 
induces relations between the imaged elements. These relations are represented 
by a parametrized transformation. There are only four parameters that need be 
specified, and these may be determined from the image. 

The significance of this transformation is that it provides a necessary condi- 
tion that must be satisfied by imaged elements related by a translation operation 
on a scene plane. The transformation is powerful as a basis for a grouping al- 
gorithm (a grouper) because of the following properties: it is global across the 
image plane; the class of transformation is independent of camera calibration; 
and, the class is independent of the pose of the scene plane. Furthermore, the 
transformation is exact under perspective projection, i.e. it does not require a 
weak perspective approximation. A grouper for this parametrized transformation 
is described in section 3 

Image relations of this type have appeared before in the literature. For exam- 
ple, “ID relations” such as that a line is imaged as a line; that collinear points 
are imaged as collinear points; and, that parallel lines are imaged as concurrent 
lines; all have the above useful properties. These ID relations have been used 
by Lowe [6], amongst others, as a basis for perceptual grouping. The relations 
described in this paper may be thought of as “2D relations”, and have been 
investigated previously by [4, 13]. Repeated 3D (i.e. non-planar) structures also 
induce image relations [7]. For example, points on objects with bilateral sym- 
metry (they are repeated by a reflection operation), or more generally points 
repeated by any 3D projective transformation, satisfy an epipolar geometry con- 
straint in the image [5, 7, 9]. There are also relations on the image outlines of 
particular classes of curved surfaces, such as straight homogeneous generalized 
cylinders, and these have been employed in grouping algorithms [8, 15, 16]. 



2 The Image Relation Induced by Repetitions on a Plane 

In this section we describe the image transformation that arises from the op- 
eration of repeating by a translation on a scene plane. The derivation is very 
short. 

In general the scene plane and image plane are related by a planar homog- 
raphy (a plane projective transformation). This map is written x = PX where 
P is a 3 X 3 homogeneous matrix, and x and X are homogeneous 3-vectors rep- 
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resenting corresponding points on the image and scene plane respectively. The 
transformation P has 8 degrees of freedom (dof). 

On the scene plane the repeating translation is represented as X' = TX, 



where 
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The image transformation H between the points x and x', which are the 
images of X and X', will be called a conjugate translation. The reason for this 
is evident from x' = PX' = PTX = PTP^^x, so that x' = Hx, where H = PTP^^. 
See figure 1. 
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Fig. 1. A translation T on a world plane induces a conjugate translation H in the 
image. 



The conjugate translation H may be written as 
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= I + Avloo^ with v.loo = 0 (1) 

where I is the 3x3 identity, and 

— The 3-vector v is the vanishing point of the translation direction. It is a fixed 
point of H. 

— The 3-vector l(x> is the vanishing line of the scene plane. It is a line of fixed 
points under H. 

— The scalar A represents the translation magnitude. 

The geometric interpretation is illustrated in figure 2 

The transformation has only 4 dof, and these may be specified by the line 
loo (2 dof), a point v on loo (1 dof), and A (1 dof). This is four less dof than 
a general homography, and two less dof than the canonical and ‘simple’ affine 
transformation used by many authors in the past for this type of grouping [4], 
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Fig. 2. Geometric interpretation of the parameters of a conjugate translation 
(elation). 



— yet the transformation H exactly models perspective effects which are not 
accounted for by an affine transformation. 

A few remarks on this transformation: The transformation applies to two 
elements repeated by the translation anywhere on the image plane. If there is 
a line of repetitions (as in figure 1) then the zeroth element is mapped to the 
n-th as H = I + nAvloo"''. The transformation (1) can be determined from two 
point or two line correspondences. Once the transformation is determined, then 
so is loo. A planar projective transformation with a line of fixed points, and fixed 
points only on this line is known in the literature as an elation [10, 11]. 



2.1 Grids 

An extension to repeating by a single translation is where there is a repetition in 
two directions so that the world pattern is a grid of repeated elements. The image 
is then a conjugate grid. This mapping can be thought of as being composed of 
two elements 

Hv = I + Avloo^ Hu = I + Atuloo"'' 

one for each direction u, v, i.e. a total of six degrees of freedom. However, note 
that loo is common to both, so that once the transformation is determined in 
one direction only two degrees of freedom remain for the transformation in the 
other direction. These two degrees of freedom can be determined by one point 
correspondence. 

3 Grouping Imaged Repeated Patterns 

In the previous section it has been shown that elements that repeat by a trans- 
lation on the plane are related in the image by an elation. Thus the problem 
of grouping repeated patterns can be reduced to that of finding image elements 
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related by an elation, and the rest of this section describes a grouping algorithm 
for elations. 

Initially we do not know the elements or the transformation. This is the 
chicken and egg problem that often arises in computer vision: if we know the 
elements we can easily determine the four parameters of the transformation; 
conversely, if we know the transformation we can (relatively) easily identify el- 
ements. In essence then the grouping algorithm must determine simultaneously 
a transformation (model) and elements consistent with that transformation. A 
similar situation arises in estimating multiple view relations from several images 
of a scene, for example the epipolar geometry [12], and ideas can be borrowed 
from there. 

In outline the idea is to first hypothesize a set of elements and associa- 
tions between these elements. This set is then explored to evaluate if it contains 
groupings consistent with an elation. This is a search, but it can be made very 
efficient by a hypothesize and verify approach: the four parameters of the elation 
are determined from a small number (one or two) of associations, and this hy- 
pothesized elation is then verified by testing how many members of the set are 
mapped under it. This search is equivalent to the problem of robustly fitting a 
model to data containing outliers. In this case the model is the transformation, 
and the outliers are the members of the set which are not mapped under the 
transformation. Depending on how the elements and associations are obtained, 
a very large proportion of the set may consist of outliers. 

The algorithm is summarized in the following section, and illustrated by 
working through an example. 



3.1 Elation Grouping Algorithm 

There are five stages to the algorithm. The first two stages are aimed at obtaining 
seed correspondences. The seeds are elements and their associations, and should 
be sufficiently plentiful that some of the actual elements and associations of 
the sought elation are included. It is not the aim at this stage that all seed 
correspondences are correct. 

1. Compute interesting features. These may include interest points (e.g. corners), 
edges, closed regions, oriented texture determined by a set of filter banks etc. 
The aim is simply to identify regions of the image that are sufficiently interesting. 
See figures 3-5 

2. Associate features. An affinity score is then employed to associate features 
that may be related. Generally the affinity is based on ’similarity’ and prox- 
imity. An example would be cross-correlation of nearby interest point intensity 
neighbourhoods. Primarily the choice of affinity score is driven by the invari- 
ance sought on the scene plane. For example, that the albedo (reflectance) of 
an element should be exactly repeated in the scene. However, illumination and 
imaging effects require that the affinity score has a greater degree of photometric 
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and geometric invariance. At the most basic level the aim is to use photomet- 
ric cues to filter out obvious mismatches (e.g. matching a predominantly black 
region with a predominantly white region), but to retain plausible matches. It 
is important that the affinity score is at least partially geometrically invariant 
to the transformation sought : if the affinity score is too sensitive to the effects 
of the transformation, it could reject correct matches. For example, intensity 
cross-correlation is invariant to translation, but is variant to the rotation and 
skewing which occur under an elation. 

One approach is to use combined affine/photometric invariants [14]. These 
can be applied to regions bounded by automatically detected closed curves. The 
advantage of such invariants are two fold: first, invariants can be matched effi- 
ciently using indexing; second they can be associated globally across the image. 
An example is shown in figure 3. Another approach is to determine geometric 
features, since these are largely invariant to photometric conditions, and then 
associate the features based on the intensity cross correlation of their neighbour- 
hoods. An example of this is shown in figure 6. 




one cluster verified grouping enlarged grouping 



Fig. 3. Seed matches using closed curves. The idea here is to identify interesting 
regions by detecting closed Canny [1] edge contours, and then determine if these 
regions are related by affine transformations by computing their affine texture 
moment invariants [14]. Regions which are related by an affine transformation 
have the same value for affine invariants. Thus clustering on the invariants yields 
a putative grouping of regions. Eight affine invariants are computed, so each 
curve gives a point in an 8-dimensional space. The points are clustered in this 8D 
space by the k-means clustering algorithm. The plot shows the distribution and 
clustering of the zeroth order moments of shape (horizontal axis) and intensity 
(vertical axis). The cluster used as a hypothesised grouping is the bottom left- 
most one. 
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The next stage is a robust estimation of the elation based on the seed corre- 
spondences. 

3. RANSAC [2] robust estimation. An elation can be instantiated from a minimal 
number of correspondences that provide either (a) two line correspondences, no 
two of which are collinear or (b) two line correspondences, two lines of which 
are collinear, and one point correspondence on the other two lines. The robust 
estimation then proceeds as follows: 

1. Select a random minimal sample of seed correspondences and compute the 
elation H. 

2. Compute the number of inliers consistent with H, i.e. the number of other 
seed correspondences that map under H. 

3. Choose the H with the largest number of inliers. 

For example, in figure 7, the white lines denote the initial seed correspondence 
chosen and the darker^ lines denotes the correspondences found to be consistent 
with the elation estimated from the seed. 

The RANSAC fit provides an initial estimate of the elation. This estimate is 
then refined by the following stage. 

4- Maximum Likelihood Estimation (MLE). Re-estimate the four parameters of 
H from all correspondences classified as inliers by minimizing a ML cost func- 
tion. A ML estimation requires the estimation of the elation together with a 
set of auxiliary points which map exactly under the estimated elation. The cost 
function is the image distance between the measured and auxiliary points. As- 
suming the measurement error is Gaussian, then minimizing this cost function 
provides the ML estimate of the elation. See [12] for a description of MLE for 
homographies. 

For the example at hand, the vanishing line and vanishing point of the MLE 
are shown in figure 8. 

5. Guided matching. Using the estimated parameters search for new elements 
consistent with the model by defining a search region about the transferred 
element position. As figure 8 shows, the location of new elements predicted by 
the MLE can be very accurate. 

Further examples of elation grouping, using exactly the same algorithm, are 
shown for other images in figures 9 and 10. 



3.2 Grid Grouping Algorithm 

A similar hypothesize and verify algorithm to the elation grouper may be applied 
to the case of a conjugate grid, described in section 2.1. Examples of the grid 
grouper are shown in figures 11-14. 



^ Blue in the luxury edition of this paper. 
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Original image. Fitted lines. 



Fig. 4. The sought (but unknown) element /grouping is the repeated floor tiling. 
The features which successfully provide this element/grouping are line pair in- 
tersections. The first stage in determining the features is fitting straight lines to 
Canny edge detector output. 




Line intersections Lines and points together 



Fig. 5. Left: points of intersection of the lines found above. The line segments 
are extended slightly to allow intersections just beyond their endpoints. Right: 
lines and intersection points together. Note that line intersections do identify 
the corners of the floor tilings, but these points are only a small proportion of 
all the intersections detected. 
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Closeup of features 



Seed matches 



Fig. 6. Left: a closeup of the computed intersections. Right: the black lines join 
pairs of intersections which are deemed to look similar on the basis of intensity 
neighbourhood correlation. Line pairs which reverse orientation are excluded, 
since these cannot map under an elation. 




Sample with highest support 




Sample with next highest support 



Fig. 7. These two images demonstrate the core of the method : each seed match 
(shown in white) is sufficient to determine an elation in the image. These putative 
elations can be verified or rejected by scoring them according to the number 
of feature correspondences consistent with them. The two seed matches whose 
corresponding elations received the highest support are shown. 




174 



Frederik Schaffalitzky and Andrew Zisserman 




Fig. 8. Given the inliers to the elation grouping, the parameters of the elation 
can be estimated (MLE) more accurately. Left: the ground plane vanishing line 
and vanishing point of the translation direction. Note that the horizontal line 
is a very plausible horizon line and that the feature tracks all pass through 
the vanishing point. Right: the accuracy of the estimated parameters is also 
demonstrated by transferring elements under the elation: the extended tiling is 
obtained by mapping the original image lines under the estimated elation. 




Fig. 9. More examples of the elation grouper in action. The corners of the win- 
dows of the building on the left have been grouped together by the elation 
constraint. The wall on the right has two sizes of brick, but they are grouped 
together here by virtue of satisfying the same elation constraint. 
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Fig. 10. These figures show two groupings found in the same image by the 
elation grouper. Despite the difference in pose of the planes in the world, the 
same grouping algorithm is successful for both cases. 



The importance of guided matching, which is the final stage of the algorithm, 
is very well illustrated in these examples. The previous stages of the algorithm 
have delivered a ML estimate of the transformation, and a number of elements 
which are mapped under the estimated transformation. In the case of a grid it is 
a simple matter to determine which of the integer grid positions is unoccupied, 
and then search the image for evidence of an element at the corresponding image 
point. 

In detail an element is verified by comparing its similarity to the nearest (in 
the image) existing element of the grid. In figure 12 the similarity is measured 
by cross-correlation. This procedure identifies elements which have been missed 
in the initial feature detection. There may well not be any features present, but 
because the transformation and intensity are tightly estimated false positives 
are not generated. Another possibility is to reapply the segmentation in the 
indicated region, but with the segmentation parameters suitably modified to 
allow for a perspective scaling. For example, suppose a square is 50% of the size 
of its neighbour then the Gaussian width of a Canny edge detector could be 
reduced to detect sharper edges, and the line length thresholds also reduced. 

The output of the grid grouper provides many examples of the imaged ele- 
ment. From these we can now estimate the frontoparallel intensity on the tile : 
each projectively distorted element is warped into the unit square and the result- 
ing textures are averaged in the unit square. The image can then be synthetically 
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Fig. 11. The first stage of the grid grouping algorithm. From the original image 
(left), an initial grouping (right) is computed by associating features using only 
correlation of intensity neighbourhoods. 




Grid structure found in the grouping New elements found by guided search 



Fig. 12. The second stage of the grid grouping algorithm. The initial grouping 
found is processed to elucidate the spatial organisation, namely the grid-like 
structure of the locations of the elements. This structure is then used to guide 
a global search for new elements. Note that although only half of the potential 
elements are determined by the initial fit of figure 11, the tight constraints on 
geometry and intensity provided by the transformation enable virtually every 
visible element to be identihed. 
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Fig. 13. The floor is generated from the learnt element and spatial organization 
of the grid. 




Fig. 14. Another example of the grid grouper. Left: original image. Right: the 
pattern is generated from the element (which is included as an inset) and group- 
ing determined by the algorithm. Note the algorithm only selects elements be- 
longing to the grid. The two planes in the scene are geometrically indistinguish- 
able. 
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generated by applying the learnt transformation to the estimated intensity of the 
element. This is demonstrated in both figure 13 and figure 14. It demonstrates 
that the element plus grouping does provide a succinct description for substantial 
parts of the image. 

3.3 Grouping Performance 

It is evident from these examples that the elation and grid groupers perform 
extremely well — e.g. the grid grouper identifies all the non-occluded elements 
with no false positives. This success can be attributed largely to the fact that the 
transformation has been modelled exactly, and that it is very over determined 
by the available image data, i.e. there are many more correspondences, which 
provide constraints, than the four parameters which must be determined. 

Ideally the algorithm should return a description of the element and a spatial 
organisation of the grouping. It is easier to determine the element for the grid 
than for the elation, because in the case of the grid the element is delineated 
in both directions, whereas for the elation the element is only demarked in the 
translation direction. 

The organization of the grouping is quite primitive at present, consisting 
of little more than the grid positions occupied. A more compact description 
would be the element and a set of operations which generate the grid. Such a 
description is not uniquely defined of course, as the same grid can be generated 
by repeating an element by one unit or by repeating a pair of elements at two 
units of spacing. In fact, for the group of integer displacements on the plane along 
the two coordinate axes, the grid can be generated by any one of the following 
sets of translation vectors 




Although clearly the first two are a more suitable choice as a basic generator. 

One idiosyncrasy of using the number of inliers as a scoring mechanism in 
RANSAC is that generators at the smallest repeating distance will always be 
selected because there will be more of these present in the seed set. 

There are also various meta-groupings that could be used to spatially organize 
the data. For example the top windows in figure 3 may be organized as four 
meta-groupings, each consisting of nine grouped elements. 

4 Conclusions and Extensions 

Here we have investigated in detail one repeating operation on a plane, namely 
a translation, for which the induced transformation is an elation. This serves 
as an exemplar for the other wall paper groups (discrete subgroups of the 2D 
affine group) of repeating operations on a plane, such as glide, rotation and 
reflection groups [3]. Indeed the operation need not be restricted to a single 
plane. Examples of similar repeating operations and the induced image relation 
are shown in the table below. 
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Transformation 



Elation 



I + Vloo^, 
where loo-v = 0. 



Family of Planar 
Homologies 

I + fcvloo^(loo^v) 
for integers k 



Conjugate Rotation 

H" = I for an 
n-fold symmetry 



Parallel Lines 
The imaged lines 
are concurrent. 
Equal spacing in 
world determines 
world plane 
vanishing line 



Example image 



Schematic 




Similar grouping strategies can be developed for each of these examples. Since 
in man made scenes there are a plentiful supply of elements that do exactly repeat 
on planes, it is certainly worth building groupers for those repeating operations 
that commonly occur. It is clear there are always two aspects that must be 
considered when designing such groupers: 

1. Grouping geometry: Given a repeating operation in the world, determine 
the geometric relationships that are induced in the image between the imaged 
repeated elements. 

2. Grouping strategy: Develop a grouping strategy based on these relations. 
This will usually involve a choice on the degree of geometric and photometric 
invariance required for related elements. 
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It is certainly plausible that efficient and reliable groupers can be built for vir- 
tually any class of exact repeating operation. However, several degrees of greater 
generality will be required for the non-exact repetitions that also commonly oc- 
cur: even if the repeating operation is on a plane, it is often the case that either 
the repetition is not exact, or the element is not exactly repeated by the repeti- 
tion. A brick wall has both these problems. This type of non-exactness can be 
modelled by drawing the repeating parameter from a suitable statistical distribu- 
tion. A far more demanding extension is to the type of repetition that occurs for 
leaves on a tree, where the colour, shape and size will vary from leaf to leaf (trees 
are like that), there is a wide (but not uniform) distribution of element poses 
and there are complex lighting effects produced by both leaves and branches. 
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Constrained Symmetry for Change Detection 
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Abstract. The automation of imagery analysis processes leads to the 
need to detect change between pairs of aerial reconnaissance images. Ap- 
proximate camera models are available for these images, accurate up to 
a translation, and these are augmented with further constraints relat- 
ing to the task of monitoring vehicles. Horizontal, bilateral, Euclidean 
symmetry is used as a generic object model by which segmented curves 
are grouped, first in a 2-d approximation, and then in 3-d, resulting in a 
sparse 3-d Euclidean reconstruction of a symmetric object from a single 
view. The method is applied to sample images of parked aircraft. 



1 Introduction 



1.1 Change Detection and Aerial Surveillance 
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2 Recovering the Symmetry 
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2.2 Unconstrained Problem 

nv g h un on n ov y o pi n y u v g n 



un 


P oj 


V y 


n 


1 


wo 


k 


4 . Ou 


pp 0 h 


. w 




on 


n 


ng 


P 


n on 0 


h 


u 


V wh 


h 


w 


nv n 


0 h 


y 




y n 


0 




0 


P oj 


V y 


n u 


1 


p 


n on 


h 


0 


nil 


on 


po n 


on 


h u 


V . ow 


V 


h 


OV 


y 0 


u h po n 


on u 


V 


hghly 


n 


^ 0 


no 


h ng 


n 




on 






0 


1 n 


h 


1 


0 




ho 


n wh 


h h u 


V 


g 


n 




p 


z 


y u 


ng 


h 


ng n 




wo 


nfl 


on po n 


. h 


u 


V hu 


p 


z 


y 




h 


V 


lly 


w h 


ny 


n n 


0 


0 


h 




u 


V g V n 


ju h 


0 


pon n 


0 


h 


P 0 


nfl on 


po n . 


h 


n 




n 0 


y h n 






V ly 


n 


0 


olv 


0 1 




0 


P oj 


V 




ho wh h 


opolog 


lly 


qu V 


1 n 


0 h 


lo 


n 


n 


0 




















On 


p 0 1 


w h 


h 


PP 


0 


h 


h u V 


p n 


on. 


nfl 


on po n 


y no X 0 


X 


n 




on 


0 


u V n 




ul 


0 


ov 


0 U ly. 


h 


V p 0 




0 


xp 




n 


u ng 


non lo 


1 


n 


nv 


n 


U V 


P 


n on 




on 


h 






pp ox 


on 1 . 


n 


h 


0 gn 


1 0 





u V pp ox y polygon wh h polygon v ho n o 1 

long h u V . h pp ox on g n u v ly y 1 ng h 

g h po n on h u v wh huh oh polygon 1 pp ox 



on. 




186 Rupert W. Curwen and Joe L. Mundy 



h n 1 pp ox on ju h 1 n h ough h u v n po n . h 
pp ox on y o pu v y p ly n g v goo v u 1 h 
whh uv. gono uv wh h hghuvu no pi 

o n ly n 1 ng h h n ho w h low u v u . 

2.3 Invariance of the Ramer Approximation 

n g n ng h pp ox on w no h o ny g n o u v h 
po n p k h n X polygon v x ho wh h h p p n ul 

n o hlnjonngh uv npon x 1. hponwll 

uhh h ngn oh uv p lllohlnjonngh npon. 



u p 


11 lln 


n p 11 1 un 


n n 


no 0 h po n 


P k 


on n 


n n 


0 U V 


W 11 0 


pon 0 


h po n 


on h 


0 g n 1 


u V . 


















1 0 u 


p y h 


n 


ng on 


on 0 h pp 


ox 


on. L 


^0 


h 


w n h 


u V : 


n h 1 n 


jo n ng n 


po n . 


L An 


h 


w 


n h u V 


n 


pp ox 


on wh n h 


n 


p n n 


h pp ox on. h n 


n h 


pp ox 


on wh n 11 


low 


on 


n V lu . 


n h 0 


0 


1 0 


n nv n 


h 


n on 


on 


on gu 


n 0 g V 


n 


n nv n 


1 pp ox on. 




n 


h p n 


0 0 lu on h 


p n 


on w 11 no 


nv 


n n 


h n 


po n 


n h 1 n 


g n 


u h 


pp ox 


on. 


ow V 


w h 


V h 0 


u w 


h h 


ho n 


ov ng y 




u h 


Ing 


n u 


fly wng 


hown 


n gu 2 


. h jo 1 


on 0 h 


PP 0 


h h 


0 no 


1 


hop 


0 h ng nil on po n . 


h 


n 


0 wh 


y h u 


0 lo 1 


u V p op 


u h 


olou 


n 


X u . n 


gu 2 


0 


h lo 1 


olou p op 


0 pu 


long 


h on 0 u V n 


h pp 


ox on 


n h u 


0 


on n 


h 


h. 














u 


h h 


ho n 


only 


ov 1 


0 y 




n lu ng 


n 


n 0 p oj 


V no 


on 0 pi n 


u V . w 


xplo 


h u 


0 


on 1 on 


n 0 olv 0 


3 1 


1 y y- 







2.4 Constrained Problem 

noupol o nh ovyoy y uh pi yh 

wo on n V 1 1 . h known n h y y u o 

ou pi n p p n ul o h g oun pi n . u h h g oun pi n 
x ly h 3 g on o n p ov y h n ly . 
h on n How uopoj h g konohg oun pi n . 
y u V wh h pp ox ly n h g oun pi n w 11 h n o 

2 ul nl ly yonhg oun pi n . h llu n 

gu 3. 

ho no u ppv .hpojonpo y 

olv ng nu lly o h n on o h y w h h 3 g oun 

pi n . 




Constrained Symmetry for Change Detection 187 




Fig. 2. pi n y 

h hu h floo 1 
g n oh 
y y n o 



y OV 
o 

g . n wh 

OV h 



u ng h o 
o ly. n 1 k 

h g 

n ppl 



pp ox on. 
h o g n 1 g 
h ong n 



u h o h V ng OV 
pi n w n u h on n 
p p n ul o h g oun o 
h g . 



h 1 ly yxonhg oun 

h hSplno 1 ly y 

ho 3 onuonohuv n 



2.5 Recovering the 2-d Symmetry Axis 



g 


on 


w 


p 0 n h 


g u 


ng 


h 


g n on Igo 


h 


0 0 


hw 11 1 


n 


gh 1 n w 




0 h 




h In 


w 


h n 


P oj 


on 0 


h 


g oun pi n 0 


OV p 


P 




/ k w n 


h op 


100 


long 


In L 


1 


0 u h p 0 


ng. 


h g 


V 


n pp ox 


1 


1 


u 1 


n y 




y n 2 hown 


n gu 


3. 


h 


y y h 


only 


wo 


g 


0 


0 


ng h p 


0 


h 2 




X 0 y 


y on 


h 


pi n . 


ough 


P 


hn qu w u 


0 n 


h 




ong X 0 


y 


y- 





188 Rupert W. Curwen and Joe L. Mundy 




b) 

Fig. 3. ong In n 
h g oun pi n u ng 
u 1 n 1 1 y 




y 



long h y on o 

h y o n pp ox 





on 


P 


0 1 n g n li n I 2 




hown n gu 


4. 


h wo 


1 n 


g 


0 


h 0 h un h y 


y 


woul h V n 


X 


0 y 




y g V n 


y h 


ng 1 n 6. h k w 


0 n 


u 


h h 


P oj 


on 0 


h n 


I 2 on 0 6 ov Ip. 0 ng w 


P 


0 n 2 


1 n 


ough 


P 


wh 


h oc 


in n h p 0 


pon 


0 h ngl 


n 


1 ng h 



o hppn ul o hognohln. huho zon 1 1 n w p 
n y po n long on x n 1 n h ough h o g n w p n 
y po n long h o h x . 

On VO w u ul o 11 p o 1 n h ough p w 

00 h nhnno Iz oh uno uonoln np 

g V un o ough p . h n non x 1 upp on w p o 
n h op 20 po 1 x w x 

hpu V X w hn nk o ngoh ollngholn 
xpl n y hy y. huo hlnlGih fl on I w 




Constrained Symmetry for Change Detection 189 



b 




>'■ \ 

/ \ 



Fig. 4. hp o gin VO oh nglntipov h h 

poj ono hlnonoh nglnovlp .pndx 

oun n 11 1 n wh h w oil n w h I w ov o L. h 

oil n In w h n p oj on o Z n h u o h p oj 1 ng h 
w lul .h u knov 11 In nhg oup L w h o o 
pu V X o y y. 

2.6 Grouping by 2-d Symmetry 

n ngonohh2y y pow ul g oup ng h n . 

gu on h ov yo y y x o oufl g 130 

. h glnhognl g n n o noy 

u h ong xoy y 11 o nh u wh h h 

un h y y ho wh h y on o un on 1 n y. 

h ng n ng n po on o h o 1 w ov 

3 Recovering 3-d Shape from Symmetry 

On 2 xoy yh nx only 11 p o 

ov h3 hpoh uu.hxoy y ov n2x 
p oj on o 1 n X wh hi onh uplno 1 ly yP 
n 3 h 1 n ng o unknown h gh ov h g oun pi n . 1 o P 
u o ppnul ohg oun pi n . h wo on n h 

pi n P o on n on 1 p n 1 o pi n wh h p o h p n 1 




190 Rupert W. Curwen and Joe L. Mundy 




0 pon o h po on X long h h y . h ly o pi n 

p p n ul o li g oun n w p ow h X w p o 

li g oun pi n ow h . n li y o j xp 

01 whnh gonon gvnyh nly hpnlopln o 

y y o on 1 o oun 





0 


g V 


n pi 


n 


0 y 






y 0 


h p n 


1 n h 


known 


0 1 


h 


P pol 


g 0 




y 0 


h 


0 


j 


ully 


n . 


gu 


llu 


h 


0 




on 0 


h 


P pol 


u 


V 


. po 


in u on 


g n 




u V 0 


pon 


w 


h 


y 


0 


h 






R. 


h 3 


g n 


ng po n 


U 


y 


ny po n 


on 


R. 


h 


u h 


pc 


t n 


n 




11 


n h 


pi n 0 


y 


y P 


n h n 


P oj 


u 


ng 


h 


V u 


1 




R 


wh h 


h fl 


on 


0 h 


R 


n 


P. 


hu 


h po n 


u 


P 


0 


h 


g un 


R P 0 ho 


gn 1 


y R. h 


u 


V 


h 


P pol 


1 n . 


0 


P P 


V 






gh 1 


n wh h 



p h ough h V n h ng po n o h y y. 



li p pol go y on n li po 1 o pon n un g v n 
y y- o po n wn oh ogn uvnh g 

only ho po n wh h 1 on h p pol In y o pon . hi h 



u 


n 0 


1 h 0 pon n 


only on o 


h po n 


oun 


long 


h 


p pol 1 n 


0 h po n n u 


V n u 


n h 




ul pi 


po n 


long h 


p pol In. ny wo 


u V g n 


wh h 0 


y h 


P pol 


on 


n y 


u 0 on u 


3 u V V 


n h y 


no 


h u 


0 


pon n 


. u h on n 


h n qu 










n 


ng 0 no h wh 1 


0 11 n oil 


14 13 


oun 


h 


U V 


ng n 


1 0 h p pol 1 n 


h n h 0 


pon ng 


U V 


U 1 0 




ng n 1 


h 0 no nv 1 


h p V ou 


n . 


ov 


h 3 



u V How olnh on oh y lyoo pon 

ng3 uv n gn o nyp o2 uv gn wh ho y 

h p pol on n . ow v n lyouv gn wx 

lu u h o pon n o on on h on n ng only on ho 

h wh h g V un qu 3 u v . 





Constrained Symmetry for Change Detection 191 




Fig. 6. 
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Abstract, ys ms o oupl non-lin r iffusion qu ions r pro- 
pos s ompu ion 1 ool or roupin . roupin sks r ivi 
in o wo 1 ss s lo 1 n ilo 1 n or h pro o ypi 1 s 
o qu ions is pr s n . is shown how iff r n us n us 

or roupin iv n h s wo lu prin s plus u -sp i sp i lis ions, 

sul s r shown or in nsi y x ur ori n ion s r o isp ri y op i- 
1 flow mirror symm ry nr ul r x ur s. h propos qu ions 

r p r i ul rly w 11 sui or p r 11 1 impl m n ions, h y Iso show 

som in r s in n lo i s wi h si r hi ur 1 h r ris i s o h 
or X. 



1 Introduction 

ision, more th n ny other sensory in ut, en les us to o e with the v ri lity 
of our surroun ings The erform n e of iologi 1 vision systems is unriv lie 
y ny om uter vision system One t sk they re rti ul rly etter t is so- 
lle ‘grou ing’ This is the ru i 1 ste of i entifying segments th t h ve 
high h n e of elonging together rou ing ts s kin of short ut etween 
low-level fe tures n high-level s ene inter ret tions, qui kly ssem ling rel- 
V nt rts nfortun tely, little is known out the un erlying ‘ om ut tion 1’ 
rin i les 

n this er, n ttem t is m e to formul te set of working rin i les th t 
seem to un erly iologi 1 vision These re is usse in se tion 3 Then, in 
se tions 4 n 5 om ut tion 1 fr mework is ro ose th t seems t to turn 

these rin i les into om uter vision Igorithms th t work on re 1 im ges This 
is n im ort nt restri tion, th t rules out et ile mo eling of the tu 1 neur 1 
ro esses The hum n visu 1 system owes t le st rt of its ower to the huge 
mount of ro essing units n onne tions The result is not f ithful o y of 
iologi 1 visu 1 ro essing ut mo el th t exhi its useful fun tion 1 n logies 
ut first, se tion 2 looks t the im li tions of efining grou ing s re un n y 
re u tion 

2 Grouping as Redundancy Reduction 

er e tion n grou ing in rti ul r h ve often een reg re s ro esses of 
re un n y re u tion t tisti 1 inform tion theory h s een invoke to su ly 
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0 je live fun tions to e o timise y these ro esses, su h s h nnon’s entro y 

n mutu 1 inform tion (for re ent ex m le see hilli s n inger 4 

et, st tisti 1 inform tion theory oes not t kle the ore ro lem, whi h is 
to efine re un n y in the first 1 e onsi er the following strings of its 
‘0 0 00 00 0 0 00 ’n ‘0000000000 0’ n terms of h nnon 

entro y, othyiel the s me ‘re un n y’ There re 50 O’s n 50 ’sin oth 
ses The se on string efinitely seems more or ere , i e seems to ont in more 
re un n y, however e ling with su h i eren es is the re Im of Igorithmi 
inform tion theory Of ourse, one oul ssign new o es to irs of su sequent 
its, like ‘00’-‘0 ’-‘ 0’-‘ ’, in whi h se the se on it string woul su enly 

seem very or ere from the view oint of st tisti 1 inform tion theory s well 
ut this requires to ro ri tely re o e the string 

Igorithmi inform tion theory is re isely e ling with this ro lem im ly 

ut, it tries to fin the shortest om uter rogr m (whi h mounts to it 

string th t ro u es the given it string t woul not e e sy to fin rogr m 
th t ro u es the first it string n th t woul e shorter or the se on string 
sim le repeat ‘10’ eleven times woul suffi e The longer su h regul r string, 
the gre ter the g in in its woul e The re 1 om lexity of the t is efine 
s the length of the shortest rogr m th t gener tes them en e, the se on 
string is of less om lexity th n the first 

The i e 1 grou ing evi e woul solve this ro lem from Igorithmi inform tion 
theory u h evi e woul ome u with n or ering rin i le (the rogr m th t 
m xim lly om resses n there y ‘ex 1 ins’ the t Igorithmi inform tion 
theory st tes th t only very sm 11 fr tion of strings will How n re i le 
egree of om ression 3 This un er ins the ‘non- i ent Iness’ i e in th t 
r n om t h ve very low h n e of showing re i le or er 8, 9 

nfortun tely, Igorithmi inform tion theory Iso shows th t in gener 1 m x- 

im 1 re un n y re u tion nnot e hieve noting from h itin 3 ‘The 

re ognition ro lem for minim 1 es ri tions is, in gener 1, unsolv le, n 

r ti 1 in u tion m hine will h ve to use heuristi metho s’ 

One might rgue th t im ge t re f r from r n om n th t re i le 

egrees of re un n y re u tion re ossi le right w y, e g se on orre- 

1 te fe ture v lues t ne r y ixels egments n qui kly emerge th t w y n 

higher or er grou ings oul e forme se on these initi 1 results One oul 

n on further ttem ts on e the se r h for further or er gets ifh ult n 
hen e the ro lem of h ving to e 1 with r n om t is not n issue everthe- 

less, Imost 11 grou ing he nomen in hum n vision h ve een emonstr te 

to work with stimuli like r n om ot tterns en e, the hum n visu 1 system 
oesn’t seem to onstr in the in ut t it n h n le, ut r ther the regul ri- 

ties it n fin n ee , intro u ing heuristi s in the se r h for or er is the only 

ro h th t n revent the system from erforming exh ustive se r h n 

thus e oming in e t ly slow Thus, we h ve to e t the i e of ly- 

ing restri te 1 ss of regul rity ete ting s hemes n letting other ty es of 
re un n y go y unnoti e 
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n ex m le of how Iso hum n vision exhi its su h eh viour is given y sym- 
metry ete tion ee fig Mirror symmetry in r n om ot tterns is known to 
e very s lient owever, if sm 11 region roun the symmetry xis is re 1 e 
y non-symmetri 1 r n om ot ttern, the over 11 im ression of symmetry 
is seriously we kene Ithough the level of re un n y h s een e re se only 

slightly y this sim le h nge, the re un n y is mu h h r er to ete t 6 




(a) (b) 

Fig. 1. (a) asymmetric random dot pattern; (b) symmetric random dot pattern; 
actually (a) is symmetrical except for a vertical strip around the symmetry axis. 



Igorithmi inform tion theory gives justifi tion to tr ition of listing se - 
r te grou ing rin i les known from intros e tion n er e tion rese r h x- 
m les re the ‘ r”gn nz’ rules of the est Itists 23 n Lowe’s non- i ent 1 
ro erties 9 rou ing metho s shoul fo us on me h nisms with m xim 1 
e e t, voi ing rolifer tion of grou ing rules Then om ut tion time n 
resour es re well s ent mul ting the grou ing rin i les th t n ture lies 

seems soun str tegy to th t en The next se tion e Is with the gui elines 

one might extr t from the r in’s r hite ture n eh viour 

3 Computational Grouping Principles 

This se tion tries to r w on fusions from the est grou ing systems known 
to te iologi 1 vision systems t fo uses on m ro- ro erties of visu 1 ro- 

essing, s we re intereste in om ut tion 1 rin i les r ther th n their tu 1 

im lement tion in the r in 

Massive, but structured parallelism euro hysiologi 1 o serv tions of m m- 
m li n vision systems show th t neur 1 stru tures erform m ssively r llel 
kin of ro essing Moreover, the neuron 1 su str te is not like sou of millions 
of these sm 11 ro essors rom the retin , over the 1 ter 1 geni ul te nu leus 
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(L , u to the stri te, restri te, n further orti 1 re s, one fin s high 
egree of org nis tion eurons work in r llel within 1 yers, whi h re simul- 
t neously tive n together form mo ules n re s, whi h in turn jointly 
re te the visu 1 er e t This fine-gr in s ti 1 r llelism in om in tion with 

fun tion 1 r llelism Iso seem rerequisite for the f st res onses of hum n 
vision 

Local Interconnections r llelism oes not im ly highly onne te networks 
n the se of the r in, there re n estim te 0^® syn ti onne tions, for n 
estim te tot 1 of 0^^ neurons This le ves us with n ver ge of out 0,000 
onne tions for e h neuron Then Iso t ke into ount th t there ty i lly 
re multi le onne tions etween the s me two neurons n the i ture is one of 
s rse r ther th n ense onne tivity n summ ry, neuron’s onne tions re 
1 i very rsimoniously, with n em h sis on lo 1 onne tions in retinoto i 

sense ( n within 1 yer n re Iso in hysi 1 sense for th t re son 

Data Driven Processing The res onses of hum n vision to r n om ot t 
suggest there is strong t - riven s e t to e rly vision u h t n yiel 

strong im ressions of symmetry, oherent motion, n e th Treism n 9, 20 
rrie out ex eriments with rtifi i 1 stimuli, where sim le ues like olour, 
orient tion, et were sufH ient to let evi ting om onents o out imme i tely 
t is h r to see how su h erform n e oul e ue to the extensive use of 
ex e t tions out the worl r from suggesting th t ottom-u n to - own 
ro esses ought not oil or te, there nevertheless is goo re son to elieve th t 
the initi 1 visu 1 st ges e en rim rily on the retin 1 in ut 

Specialized modules oth neuro hysiology n hsy ho hysi s in i te the ex- 
isten e of more or less in e en ent h nnels for ro essing of i erent s e ts 
of vision ( olour, orient tion, motion, e th, 2 The existen e of s e i 1- 
ize visu 1 mo ules n re s in the ortex is well-est lishe y now 25 This 
s e i lis tion of re s is ke u y the sele tive onne tions etween them 
rom this one n on lu e th t the visu 1 inform tion is ro esse in r llel 
in i erent re s or mo ules therein, n th t s e i 1 ttention is i to e h 
of num er of si ues 

Bi-directional coupling euro hysiologi 1 terms like ‘visu 1 thw ys’ suggest 

mo el with ‘lower’ re s fee ing into ‘higher’ re s ut s eki 24 uts it 
“most onne tions in the ortex re re i ro 1, n in the visu 1 ortex there is 

no known ex e tion to this rule” u h re i ro 1 onne tions form the n tom- 

i 1 sis for fee k n n Iso hel to rrive t nee tive integr tion of 
inform tion ( ue integr tion t fusion 

Non-Linearity M ny of the onvolution m sks use in om uter vision h ve their 
ounter rt in the r in, so it seems owever. Ire y quite e rly in the visu 1 
ortex neurons re foun with m rke ly non-line r res onse to the intensity 

n s e tr 1 ttern in their re e tive fiel s om lex n hy er om lex ells 
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re well-known ex m les Iso in re singly one e me w re of influen es th t 
stimuli in wi er surroun n h ve e ts like illusory ontours Iso suggest 
non-line r ro essing n 11 the more so s they my e r or is e r u on 
sm 11 h nges to the stimulus 

Explicit representation of boundaries si fe tures su h s lumin n e, olour, 
motion, e th, et n 11 gener te the im ression of is ontinuities th t se - 
r te regions homogeneous in th t ue 2 The resen e of orient tion sensitive 
ells in re s th t re s e i lise in the ete tion of i erent ues 24 n Iso 

e inter rete s in i tions th t ontour me h nisms o er te in e h of these 

si fe ture m s se r tely t seems there is istin tion to e m e etween 

t le st two ty es of ro esses those th t ete t homogeneity of some sort within 

im ge segments n those th t ete t their oun ries u ker 26 referre to 

these ro esses s Ty e n Ty e ro esses, res , n ross erg 4 lie 

them fe ture n oun ry systems n the se of the lumin n e ue, the r in 
s the ity to form illusory ontours 22 

Local and Bilocal grouping One n istinguish two ty es of ete t le or er 
One orres on s to regions with homogeneous lo 1 h r teristi s, su h s lu- 
min n e, olour, or texture orient tion The se on ty e om ines ues t two 

i erent lo tions Mirror symmetry ete tion is goo se in oint t requires 

the ete tion of simil r ues t osition pairs imil r ro esses seem lie for in 

the se of 1 ss tterns 2 The visu 1 system seems to extr t is 1 ement 
fiel s (ve tor fiel s Motion n stereo re ues where simil r ro ess woul 

e useful, with the ition of h ving the two lo tions eflne t i erent mo- 

ments in time or in the in ut from two eyes The rti ul r relev n e of oth 
lo 1 n ilo 1 ro esses oul Iso e refle te in the rti ul r im ort n e 

of st n 2n -or er st tisti s in ulesz’s texture segment tion ex eriments t 
is Iso interesting to s e ul te out the lo 1 ilo 1 ivi e s n Item tive 

for the ‘wh t’ n ‘where’ thw ys, res 

Maximal usage of a limited number of grouping mechanisms e r te, s e i lise 
mo ules o not im ly th t the un erlying im lement tion 1 me h nisms re 
om letely i erent rom n evolution ry oint of view one oul ex e t th t 
su essful om ut tion 1 s heme is u li te to solve other t sks There re 
goo n tomi 1 in i tions for this 7 , em h sise y the term ‘iso ortex’ 

This suggests th t the i erent ty es of grou ing exhi ite y the r in n 

e im lemente s v ri tions of few si lue rints Of ourse, v ri tions 

etween re s re to e ex e te s result of s e i lis tion 

Compliant regularity detection M xim 1 us ge of grou ing rin i les Iso im lies 
some egree of flexi ility rou ing metho s shoul toler te evi tions from the 
i e 1 regul rities, in or er to e useful in n tur 1 environments r ly ny 
of the o serve , re 1-worl regul rities re of n 11-or-nothing n ture um n 

o servers re e g le of ete ting mil ly skewe symmetry in r n om ot 

tterns 
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n on lusion, the im of this work w s to re te grou ing fr mework or ing 
to the following gui elines 

— How fine-gr in r llelism, 

— with m inly lo 1 inter onne tions etween retinoto i lly org nise no es 

— The ro esses re rim rily t riven, se on restri te num er of 

si im ge ues 

— ro essing shoul e rrie out y i erent, s e i lize fe ture m s, 

— in lu ing non-line r ro essing, 

— i ire tion 1 on ling etween m s, n 

— ex li it re resent tions of oth fe ture homegeneity within segments n 
V ri tions th t sign 1 oun ries 

— rou ing ro esses will e of two m in ty es - lo 1 or ilo 1 - n 

— will e se s mu h s ossi le on ut few om ut tion 1 lue rints 

— They will Iso How for evi tions from i e 1 regul rities 

n the next two se tions, two lue rints for lo In ilo 1 grou ing re e- 
s ri e or e h, s e i lis tions tow r s s e ifi grou ing li tions re is- 
usse n results on re 1 im ges re shown 

4 Local Grouping 

ro ly the most str ightforw r ex m le of lo 1 grou ing ro ess is the 
ete tion of regions of roxim tely onst nt intensity This first ex m le is Iso 
sele te e use it est shows the rel tion of the ro ose grou ing fr mework 

with regul ris tion se te hniques n nisotro i i usion, of whi h it oul 

e onsi ere om in tion 

The origin 1 intensity ( of the i erent im ge oints with oor in tes ( 
will e h nge into new v lues ( su h th t sm 11 v ri tions re su resse 
The fun tion ( will e referre to s the intensity map imult neously, 
discontinuity map ( is onstru te oth ro esses re governe y non- 
line r i usion equ tion n they evolve while influen ing e bother through 
i ire tion 1 onne tions 

These equ tions h ve een erive s follows egul ris tion fun tion Is re 

the oint of e rture They ty i lly im ose solution th t shoul strike 

1 n e etween smoothness n f ithfulness to the in ut-sign 1 ( in our se 
The more so histi te regul ris tion s hemes Iso in lu e is ontinuities ut 
en lise for their re tion in or er to voi over-fr gment tion u h fun tion Is 
ty i lly re non- onvex, whi h m kes them ifh ult to extremize The most 
stu ie of these fun tion Is is 2, 

i B = I ( ? + /3( - 2 +u\B\ . 

n the im ge region oun ries B n e intro u e t is rete lo tions 

The first term ushes for n th t is smooth, while the se on will kee it from 
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rifting f r from the origin 1 intensity here oun ries re intro u e , these 
terms re ut to zero The thir term en lises the intro u tion of oun ry 

ixel This fun tion 1 regul rises n B simult neously 

There re t le st two ro lems with su h fun tion Is The first h s Ire y 
een ointe out n is the ifh ulty of fin ing the o timum se on , m jor 

ro lem re the unex e te hr teristi s th t the o timum m y h ve n the 

se of this rti ul r fun tion 1 e ges will Iw ys interse t s tri les in verti es 
with 20 etween them This is neither esir le nor intuitively le r 

There is not mu h one n o out the se on ro lem if one works vi fun - 
tion Is th t res ri e glo 1 eh viour ere we work through lo 1 res ri tions 

of grou ing eh viour The first ro lem n e re u e y going vi rel te , 

onvex ro lems The gr u te non- onvexity te hnique ( tures the 

solution s limit se of series of onvex roxim tions 2 m rosio n 

Tortorelli intro u e is ontinuity in i tor whi h is kin of smoothe 
version of B They re 1 e the term zz|i?| y 




The first term is smoothing term n the se on term th t tries to kee 
oun ries lo lise h h 8 ro ose to re 1 e this single fun tion 1 y ir 

of fun tion Is, th t re minimise together The origin 1, ifh ult o timis tion 
ro lem is re 1 e y the solution of system of i usion equ tions 

2_2 _ ( _ -=P~" -- + 2 ( . 

t ^ t p 

These evolution equ tions re 1 ul te for 11 im ge ixels t e h iter tion, 

only inform tion from neighouring ixels is nee e The first equ tion strikes 

1 n e etween smoothing (first term n kee ing it lose to the initi 1 
intensities (se on term The se on equ tion governs the oun ry strength 
g in, is smoothe (first term , ke t sm 11 (se on term unless the lo 1 
intensity gr ient is 1 rge (thir term ere ges will e ulle tow r s , 
elsewhere tow r s 0 ti lly, it v ries smoothly etween these extreme v lues 
ote th t influen es ut not vi e vers 
r w k is th t oth the intensity m n the is ontinuity m re lurre 
This ro lem n e llevi te y re 1 ing the line r i usion o er tors y the 
nisotro i i usion o er tor of eron n M lik 3 n the first equ tion, 

this is iv( — , with e re sing fun tion of ||— || or n the se on 

equ tion simil r h nge is m e, with fun tion th t e re ses with — — 



The mo ul tion f tors -0 n n e i erent fun tions of They n e 
hosen ^ n = , there y st ying lose to h h’s equ tions f one 

refers stronger e ge ete tion n less v ri tions within regions ip = n 
= — is etter hoi e 6 The e e t of the mo ilie equ tions is shown in 
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a b c 



Fig. 2. a: Part of a SPOT satellite image (of an agricultural area), b: Intensity 
map (f) = constant, {= — ). c: Discontinuity map. 



fig 2 The origin 1 im ge is shown in ( , the -m fter 30 iter tions in ( , 

n the -m in ( , The noise in the intensity v lues h s een re u e , while 
e ges h ve een sh r ene The -m h s v lues lose to ( right regions in 
( ne r the oun ries of regions with homogeneous intensity 
The si ue use in the revious ex m le w s intensity The orient tion of 
lo 1 texture is nother si ue s orient tion is efine mo ulo tt ( ontr st 
ol rity is is r e i eren es n etter e me sure s sin2( — 7 

se on i eren e with intensity is th t orient tion h s to e estim te from 
the out uts of whole nk of oriente filters with is rete series of referre 
orient tions The mo ifie equ tions re 

— = ( ( sin (2 - ^ Sq sin (2 - ( + 

— = P ^ “7 + 2 ( - sin(2 

n these equ tions st n s for the out ut of the filter with referre orient tion 
There re filters, with orient tions tt rt oth equ tions h ve the 

s me stru ture s efore or inst n e, in the first one term im oses ontinuity 
on the fe ture t h n , the se on tries to kee the estim te orient tion lose 
to th t suggeste y the r w filter t 6 
n ex m le of orient tion se grou ing is shown in fig 3 igure 3( shows 

mi ros o e im ge of met Hi stru ture The texture onsists of sever 1 su rts 

with homogeneous orient tion ix filters were use , with referre orient tions 
30 rt The en st te of the orient tion m (first equ tion is shown in 
fig 3( The orient tion is o e s intensity The i erent segments re le rly 

visi le ig 3( shows the en st te of the is ontinuity m (se on equ tion 
right rts re where oun ry strength is high ( lose to s in the hum n 
visu 1 system, thee e t of om iningtheout uts of the filters y these equ tions 
is th t orient tion n e etermine with etter re ision th n their orient tion 
i eren es might suggest (so- lie “hy er uity” 

um n er e tion of intensity shows num er of e ul rities, su h s M h 

n e e ts ( er e t of over- n un ershoots of intensity ne r oun ries 
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Fig. 3. (a) texture of a slice of metallic material, as seen through a microscope; 
(b) result for the orientation map, with orientation coded as intensities; (c) result 
for the discontinuity map. 



n illusory ontours ( er e t of e ges where they re om letely sent e 
h ve not trie to mo el the 1 tter in this grou ing fr mework yet The M h 
neet neroueif Iso higher eriv tives of intensity re use in the 
equ tions 6 

5 Bilocal Grouping 

5.1 The Bilocal Blueprint 

ilo 1 grou ing ro esses su h s the extr tion of motion ve tors, stereo is- 
rities n symmetri oint irs involve om ining oints t two i erent 
ositions, 1 eit m y e t i erent inst n es or in i erent im ges These oint 
irs re onstr ine to h ve simil r v lues for si fe ture, su h s intensity 
or orient tion ere only intensity will e use for the illustr tion of the i erent 
ilo 1 ro esses 

n or er to fix i e s, we resent ilo 1 ro ess for o ti 1 flow The equ tions 
of orn hun k 5 serve s our oint of e rture The o ti 1 flow ve tors 
{u re foun s the solution of 

— =^u- { .u+ . + — =2- { .u+ . + ( 

with intensity n su s ri ts in i ting rti 1 eriv tives ro lem with 
this system is th t the line r i usion o er tors for e the is 1 ement flel to 
lur motion oun ries urthermore, it is only e e tive in gui ing the se r h 
for o ti 1 flow in s f r s the lo 1 intensity roflle v ries line rly within 
neigh ourhoo of imensions om r le to the motion ist n e oth these 
ro lems re t kle y ting the equ tions 
The first ro lem n e llevi te if the i usion o er tors re re 1 e y 
nisotro i i usion The se on ro lem n e h n le re isely y swit hing 
to ilo 1 r ther th n ’s lo 1 formul tion The un erlying i e n e siest 
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e ex 1 ine for the se u ose we re intereste in the motion for oint 
with surroun ing intensity rofile s shown in fig 4( or ing to the o ti 1 




Dual Scheme 




Fig. A. a) The assumption that the intensity profile varies linearly may cause big 
errors for larger motions, b) Schematic overview of the proposed bilocal scheme. 



flow onstr int th t use u = —{ — ( — This is orre t over infinitely 

short time 1 ses, ut evi tions n e severe with im ges re t ken t vi eo 
r te or slower The ro lem lies in the f t th t the intensity rofile is not 
line r over the ist n es tr vele y the oints etween the two im ges The 
ro lem oul e re u e if we h goo roxim tion of the motion 
simil r o ti 1 flow onstr int n then e formul te for the resi u 1 motion, 
whi h is mu h sm Her n for whi h the intensity rofile therefore h s higher 
h n e of o eying the line rity on ition n 5 the following, m them ti 1 
reformul tion of the o ti 1 flow onstr int is erive 

u+ + {uo 0=0 



where 

( t — ( — uq t — Q t t — t 

[uo 0 = — Uo — 0 H 

n with {uq 0 the roxim tion for the motion ve tor The result is 

ilo 1 ex ression th t nee s inform tion t ( in the se on im ge n t 
{ —Uo t — 0 t in the first The ro e ure onsists of su essively u ting 
the is 1 ement estim tes n using them to om ute the s ti 1 (or tern o- 
r 1 gr ients t shifte lo tion whi h h nges with the 1 test is 1 ement 
estim tes 

There re i erent w ys to o t in the initi 1 guess for the is 1 ements One 
is to first ly the tr ition 1 equ tions ven if the motion ve tors re 
im re ise t 1 es, here n there the estim te motions will e lose to the re 1 
ones These s ots flow the ro ess to lo k on to the orre t motion fiel n the 
orre t solution will s re through the i usion ro ess e on ly, multi-s le 
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te hniques n hel to qui kly ri ge 1 rger ist n es Thir ly, there Iw ys is 
the ossi ility to ki kst rt the system from n initi 1, r n om hel g in, 1 es 
where the motion h ens to e lose to the re 1 solution n sufli c to ootstr 
the whole ro ess 



rt from the nisotro i i usion n the ilo 1 o ti 1 flow onstr int, 
thir i eren e with the system is n ition 1 oun ry ro ess 5 
re ting su h m is more intri te for the ilo 1 ro esses e r oun - 
ries, rts of the kgroun get o lu e or e ome visi le These re s h ve 

no orres on ing oints in the other im ge en e, in one of the im ges the is- 
1 ement hel is un efine This re tes n interesting symmetry etween the 
im ges for su h oints, th t oes not exist for oints th t re visi le in oth im- 
ges n ee , one n onsi er two is 1 ement liel s where oints in the first 
im ge go to in the se on n v v onsi er is 1 ement ve tor from the first 

im ge to its orres on ing oint in the se on n then ing the is 1 ement 

ve tor from this 1 tter oint to its orres on ing oint in the first im ge The 
two is 1 ement ve tors will nnihil te e bother, ex e t in oun ry regions 

The is 1 ement ve tor for oint only visi le in one im ge will m ke no sense 

The ve tor in wh tever oint is ssigne to it will not o ey the nnihil tion rule 
The m gnitu e of the sum C of the two ve tors n therefore e use to e- 
te t oun ries n ee , we n let this m gnitu e rive is ontinuity m 
of the ty e th t we use for the lo 1 ro esses n this se, it might e more 

ro ri te to 11 it is re n y m , s om lete regions might o t in high 

V lues s they re only visi le in one of the im ges n f t, there re two su h 

sum ve tors one n st rt either from the first or the se on im ge n t ke 

the orres on ing oint’s ve tor to form the sum n or er to ete to lu e 
n iso lu e regions su h u 1 s heme is ne ess ry is 1 ements from the 
first to the se on im ge n is 1 ements from the se on to the first im ge 
re extr te The over 11 system is s hem ti lly shown in fig 4( t onsists 
of six ou le i usion equ tions, four of whi h es ri e the fe ture m s, i e 
the motion ve tors of the u 1 s heme ( oth forw r n kw r , in i te 
with n h the other two me sure the is re n y from the oint of view of 
e h of the im ges 
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The is re n ies n ( g in one for e h of the u 1 s hemes gui e the 
nisotro i i usion in the m s th t extr t the motion om onents u n 
or et ile ount the re er is referre to 5 

5.2 Bilocal Specialisations 

s mentione e rlier, the ilo 1 ro esses re intro u e to h n le grou ing 
se on ues su h s motion, stereo, n symmetry The ete tion of regul r 
texture is nother ex m le h of these enefit from ert in t tions 
Motion sequen es ty i lly onsist of more th n two im ges etter re ision 
is o t ine y onsi ering fr mes th t re se r te y 1 rger time 1 se 
is 1 ements etween su h fr mes shoul e the sum of fr me-to-fr me is- 
1 ements n hen e goo initi lis tion is v il le om ut tion s ee from 
fr me to fr me n e in re se s well, s motion fiel s will norm lly not h nge 
r sti lly from one fr me to the next The 1 test is 1 ement fiel n serve 
s n ex ellent initi lis tion for the next fr me or the ex m les in this er, 
multiview ro h w s use 

or stereo the two im ges h ve to suffi e f is rities re r ther 1 rge, there 
might e ro lem of initi lis tion multi-s le ro h n llevi te this 
ro lem 0 t 11 s les ilo 1 str tegy is use , however ust lying 
lurring line rises the intensity rofile ut Iso in re ses ro lems of intensities 
getting ‘ ont min te ’ y rts not visi le in the other im ge Iso re ise lo- 
lis tion still enefits from the ossi ility to work lo lly roun two i erent 
ositions in the two im ges further s e i lis tion is ossi le for mer on- 
figur tions yiel ing horizont 1 e i ol r lines n th t se the verti 1 is rity 
om onents n e ut to zero n the system re u es to set of 4 ou le 
equ tions ven if e i ol r lines o not run erfe tly horizont 1, it is often use- 
ful to st rt the system u with only these 4 equ tions tive, to get goo 

initi lis tion more r i ly, n then lug in the ition 1 equ tions for 

or symmetry more fun ment 1 t tion is require , s i e lly ne r y is- 
1 ement ve tors re not the s me ut v ry in s e ifi w y orres on ing 
oints lie i metri lly long symmetry xis This me ns th t the is 1 e- 
ment fiel shows r ther stee gr lent whi h n not e h n le y sim le 
smoothness o er tor urthermore, the tu 1 osition of the xis is not known 

eforeh n , whi h om li tes the se r h ro ess The ilo 1 ro ess n e 

te to look for symmetry of re efine orient tion (or n orient tion lose 
to th t re efine orient tion for th t m tter This might seem r ther lumsy 
solution, ut orient tion e e ts in hum n symmetry ete tion suggest th t Iso 
the r in h s to erform kin of s n over ossi le orient tions The or er of 
this s n h s een the su je t of intensive e te in er e tion rese r h To give 

n i e of how the ilo 1 s heme n e te , verti 1 symmetry mo ule 

is is usse n th t se we only h ve to el with horizont 1 is 1 ement 
The following h nges re m e 

— irst of 11, the smoothing o er tor is re 1 e y iv ( ( (u( — 2 

whi h t kes the esire h nge in the horizont 1 is 1 ement u with into 
ount 
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— The horizont 1 gr ient in the equ tions (2 h s to e inverse , sin e 

long the horizont 1 ire tion orres on ing oints h ve mirrore intensity 
rofiles 

— Two o osite is 1 ement fiel s with rel tively short is 1 ements re 
t ken s initi lis tion of the u 1 is 1 ement s heme 

This mo ifie ilo 1 ro ess su essfully lo ks on to the symmetry ne r its xis 
rom there it s re s to re s further from the xis sy ho hysi 1 fin ings with 

hum n symmetry er e tion suggest simil r ro ess t woul ex 1 in why 

isru tion of symmetry ne r the xis h s su h rofoun im t on symmetry 

ete tion The mo el is Iso onsistent with hum n vision in the sense th t 

symmetry nee not e erfe t The xis n e somewh t urve , intensity t 
symmetri ositions oesn’t nee to e i enti 1, et 

or the ete tion of regul r texture the m in i eren e from the si s heme 

lies in the f t th t sever 1 is 1 ement fiel s - e h orres on ing to one ty e 

of erio i ity - n oexist e w nt to fin 11 the i erent, sm llest erio s in 

the ttern This is hieve y initi lising sever 1 onst nt is 1 ement fiel s 

with i erent orient tions n ve tor lengths They will lo k on to the erio i 
stru ture if there is lo 1 greement These m t hes ro g te to other re s s 
the initi 1 is 1 ement fiel s get istorte in or er to follow the v ri tions in 
texture orient tion n s ing ever 1 of the initi lis tion fiel s m y lo k on 
to the s me erio i ity, ut if they s m le i erent orient tions n is 1 e- 
ments ensely enough, every ty e of erio i ity will e ‘ ete te ’ y t le st one 
initi lis tion fiel in ing out whether erio i ity h s een ete te is e sy, 
s the is re n y m s will show low v lues 

5.3 Examples of Bilocal Grouping 

ig 5 shows n ex m le of motion extr tion rs rive with i erent s ee s 
on rossro rt ( shows one of the fr mes of the vi eo in ut rt ( 
shows the m gnitu es of the extr te motion ve tors righter me ns f ster 
The outlines re sh r , ut ue to the moving sh ows on the groun they o 




Fig. 5. a: Frame from a video of a traffic scene, b: Magnitudes of the extracted 
motion vectors. c,d: Discrepancies for each of the dual schemes. 
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not re isely oin i e with the outlines of the rs rts ( n ( shows 

the is re n ies for e h of the u 1 s hemes One ( ete ts the rts of the 

ro th t e ome visi le g in, while the other ( shows rts th t re getting 
o lu e n f t, one n oth m s n get the ontours of the rs n 

O o er tion of the m s yiel s the regions th t re not visi le in one of the 
views 

ig 6 gives se on ex m le of motion extr tion ive fr mes of swimming 
fish were t ken s in ut rt ( shows one of the fr mes The velo ities 1 u- 




a b c 



Fig. 6. a: One of 5 frames with a swimming fish, h: Veloeity magnitude when 
motion is only ealeulated between two frames, c: Velocity magnitude when motion 
between the first and the fifth frame. 



1 te for two su sequent fr mes re shown in ( rt ( shows the result for 
the velo ity from the first to the fifth fr me, se on the sum of fr me-to-fr me 

motions s initi 1 is 1 ement fiel s n e seen, the o je t is eline te 

very sh r ly from the kgroun n the velo ity is very homogeneous over 
the fish’s o y 

ig 7 ( shows stereo ir of two m nikins s rt ( shows, the e th is- 

ontinuities re re isely lo lise in the is re n y m s ote th t the k- 

groun is Iso highlighte s yiel ing irrelev nt is rity t This is hieve 

y n ition 1 ‘texture m ’, nother i usion ro ess th t ivi es the s ene 
into texture n untexture rts 7 is rities re su resse for untex- 
ture rts rt ( shows two views of the 3 re onstru tion m e on the 
sis of the extr te is rities The is rities were foun with the hel of 
multi-resolution ro h three-level im ge yr mi w s use 
ig 8 shows n ex m le for f e, whi h is only we kly texture om re to 

the m nikins evertheless, the re onstru tion is re son le g in, is rities 

on the kgroun were su resse through the texture m 
igure 9 shows two s enes with symmetry rt ( shows n -r y of hum n 
thor X The symmetry is not erfe t The 1 es with zero is 1 ement in i te 
the osition of the symmetry xis They re highlighte in the figure The s me 
w s one for the he -shoul er s ene (‘ 1 ire’ in rt ( The ro ess is seen 

to e quite toler nt to evi tions from i e 1 symmetry 




Fig. 7. a: Original stereo image pair, b: Resulting diserepancies. e: Two views 
of the 3D reeonstruction. 



Fig. 8. a: Original stereo image pair of a face, b: Two views of the reconstrueted 





y 
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Fig. 9. Examples of symmetry detection; b: X-ray image of a thorax, c: head- 
shoulder scene. 



The ete lion of regul r texture is illustr te in fig 0 rt ( shows the 
in ut im ge The go 1 is to fin the re e te textures of the shirt rt ( 
highlights regions where regul r texture w s foun rker zones se r te the 
i erent ie es of textile where they re knitte together rt ( gives n 

i e of the re ision of the extr te erio i ities sing sh e-from-texture 
ro h lo 1 surf e orient tions re given for oints li ke on y the user 
The orient tions fit our visu 1 ex e t tions 




Fig. 10. a: Original image, b: Segmentation of relevant periodical texture areas, 
c: Estimated surface orientation using shape-from-texture. 



212 



r 



ro sm ns n Lu 



n ool 



n the ilo 1 ro esses es ri e so f r, oint irs re referre with i enti 1 

V lues for the si fe ture (here only intensity hen resente with i erent 

o tions the visu 1 system nee not even refer the solution with i enti 1 v lues, 
however 2 e h ve uilt exten e ilo 1 s hemes th t How intensities to 

h nge etween views (with s ti lly v rying s le + o set is ussion is out 

of the s o e of this er 

6 Conclusions 

n this er we h ve rgue th t grou ing is ne ess rily se on t sk-oriente 

sets of rules, not univers 1 rin i le e h ve then trie to set out gui elines 

for the sele tion n im lement tion of grou ing rules The fr mework th t we 
ro ose is se on the evolution of ou le , non-line r i usion equ tions 
n these systems, e h of the equ tions etermines the evolution of relev nt 
im ge ue, su h s intensity, lo 1 texture orient tion, motion om onents, et 

e ture m s ome with is ontinuity m s of their own, whi h m ke ex li it the 

oun ries of segments th t re homogeneous in those fe tures The m s re 
org nise retinoto i lly n the num er of onne tions e h ixel woul nee 

for su h equ tions to e im lemente on h r w re with fine-gr in r llelism 

n e ke t low 

uture rese r h will e ime t om ining i erent m s into single sys- 
tem ork will e nee e on the onne tions etween ues s n ex m le, 3 

re onstru tion of the f e in fig 8 n e im rove y ex loiting its symmetry 
Acknowledgment: u ort y s rit-LT ‘ m roofs’ is gr tefully knowl- 

e ge 
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Integrating Geometric and Photometric 
Information for Image Retrieval 

Cordelia Schmid^, Andrew Zisserman^, and Roger Mohr^ 

^ INRIA Rh6ne-Alpes,655 av. de I’Europe, 38330 Montbonnot, France 
^ Dept of Engineering Science, 19 Parks Rd, Oxford 0X1 3PJ, UK 



Abstract. We describe two image matching techniques that owe their 
success to a combination of geometric and photometric constraints. In the 
first, images are matched under similarity transformations by using local 
intensity invariants and semi-local geometric constraints. In the second, 
3D curves and lines are matched between images using epipolar geometry 
and local photometric constraints. Both techniques are illustrated on real 
images. 

We show that these two techniques may be combined and are comple- 
mentary for the application of image retrieval from an image database. 
Given a query image, local intensity invariants are used to obtain a set of 
potential candidate matches from the database. This is very efficient as 
it is impiemented as an indexing algorithm. Curve matching is then used 
to obtain a more significant ranking score. It is shown that for correctly 
retrieved images many curves are matched, whilst incorrect candidates 
obtain very low ranking. 



1 Introduction 

The objective of this work is efficient image based matching. Suppose we have 
a large database of images and wish to retrieve images based on a supplied 
’query’ image. The supplied image may be identical to one in the database. 
However, more generally the supplied image may differ both geometrically and 
photometrically from any in the database. For example, the supplied image may 
only be a sub- or super-part of a database image, or be related by a planar 
projective transformation, or the images may be two views of the same scene 
acquired from different viewpoints. 

An example application to keep in mind is the retrieval and matching of 
aerial views of cities. If the supplied image is acquired from a large distance, by a 
satellite for example, then the geometric distortions with respect to the database 
images are planar projective and partial overlap. However, if the supplied image 
is acquired at a distance where motion parallax is significant, by a low flying 
plane for example, then the geometric distortion can not be covered by a planar 
transformation and 3D effects must be taken into account. The illumination 
conditions (sun, clouds etc) may well also differ between the supplied and image 
database images. 
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There are two key ideas explored here. The first is that matching can be 
made more robust by using both geometric and photometric information. This 
is illustrated in two ways: first, in section 2, we describe a method of image 
retrieval based on local interest point descriptors which is invariant to image 
similarity transformations; second, in section 3, we describe a method of curve 
matching between images of 3D scenes acquired from different viewpoints. 

The second idea is that the efficiency of indexing using interest points can be 
supplemented by the verification power of curve matching. This is illustrated in 
section 4 where it is shown that the interest point matcher provides fast access to 
an image database, and that the retrieved images may be ranked by the number 
of matched curves. 

2 Image Retrieval Based on Intensity Invariants 

The key contribution of several recognition systems has been a method of cutting 
down the complexity of matching. For example tree search is used in [2] . In index- 
ing, the feature correspondence and search of the model database are replaced 
by a look-up table mechanism [10]. The major difficulty of these approaches is 
that they are geometry based which implies that they require CAD-like repre- 
sentations such as line groupings or polyhedra. These representations are not 
available for objects such as trees or paintings, and can often be difficult to 
extract even from images of suitable CAD-like objects. 

An alternative approach is to not impose what has to be seen in the image 
(points, lines . . . ) but rather to use the photometric information in the image 
to characterise an object. Previous approaches have used histograms [18] and 
related measures which are less sensitive to illumination changes [5, 11]. 




/=\ 

\=/ 



vector of local 
characteristics 



Fig. 1. Representation of an image. 



The idea reviewed here, which originally appeared in [13, 14], is to use local 
intensity invariants as image descriptors. These descriptors are computed at 
automatically detected interest points (cf. figure 1). Interest points are local 
features with high informational content [15] and enable differentiation between 
many objects. Image retrieval based on the intensity invariants can be structured 
efficiently as an indexing task. 

Experimental results show correct retrieval in the case of partial visibility, 
similarity transformations, extraneous features, and small perspective deforma- 
tions. 
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2.1 Interest Points 

Computing image descriptors for each pixel in the image creates too much in- 
formation. Interest points are local features at which the signal changes two- 
dimensionally. In the context of matching, detectors should be repeatable, that 
is a 3D point should be detected independently of changes in the imaging con- 
ditions. A comparison of different detectors under varying conditions [13] has 
shown that most repeatable results are obtained for the detector of Harris [6]. 
The basic idea of this detector is to use the auto-correlation function in order to 
determine locations where the signal changes in two directions. 

Figure 2 shows interest points detected on the same scene under rotation. 
The repeatability rate is 92% which means that 92% of the points detected in 
the first image are detected in the second one. Experiments with images taken 
under different conditions show that the average repeatability rate is about 90%. 
Moreover, 50% repeatability is sufficient for the remaining process if we use 
robust methods. 




Fig. 2. Interest points detected on the same scene under rotation of the world 
plane. The image rotation between the left image and right image is 155 degrees. 
The repeatability rate is 92%. 



2.2 Intensity Invariants 

The neighbourhood of each interest point is described by a vector of local in- 
tensity derivatives. These derivatives are computed stably by convolution with 
Gaussian derivatives. In order to obtain invariance under rigid displacements in 
the image, differential invariants are computed [4, 9]. The invariants used here 
are limited to third order. The vector which contains these invariants is denoted 
V. Among the components of V are the average luminance, the square of the 
gradient magnitude and the Laplacian. 

To deal with scale changes, invariants are inserted into a multi-scale frame- 
work, that is the vector of invariants is computed at several scales [21]. Scale 
quantisation is of course necessary for a multi-scale approach. Experiments have 
shown that matching based on invariants is tolerant to a scale change of 20% [13]. 
We have thus chosen a scale quantisation which ensures that the difference be- 
tween consecutive sizes is less than 20%. 
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Our characterisation is now invariant to similarity transformations which are 
additionally quasi-invariant to 3D projection [1]. 

2.3 Retrieval Algorithm 

Vector comparison Similarity of two invariant vectors is quantified using the 
Mahalanobis distance o?m- This distance takes into account the different magni- 
tude as well as the covariance matrix A of the components. For two vectors a 
and b, c?M(b,a) = — a)^A^i(b — a). 

In order to obtain accurate results for the distance, it is important to have 
a representative covariance matrix which takes into account signal noise, lu- 
minance variations, as well as imprecision of the interest point location. As a 
theoretical computation seems impossible to derive given realistic hypotheses, it 
is estimated statistically here by tracking interest points in image sequences. 

The Mahalanobis distance is impractical for implementing a fast indexing 
technique. However, a base change makes conversion into the standard Euclidean 
distance ds possible. 

Image database A database contains a set {Mk} of models. Each model Mk 
is defined by the vectors of invariants {Vj} calculated at the interest points of 
the model images. During the storage process, each vector Vj is added to the 
database with a link to the model k for which it has been computed. Formally, 
the simplest database is a table of couples ( Vj, k). 

Voting algorithm Recognition consists of finding the model Mf, which corre- 
sponds to a given query image /, that is the model which is most similar to this 
image. For this image a set of vectors {Vj} is computed which corresponds to the 
extracted interest points. These vectors are then compared to the Vj of the base 
by computing: duiViyVj) = dij If this distance is below a threshold t, 

the corresponding model gets a vote. 

The idea of the voting algorithm is to sum the number of times each model 
is selected. This sum is stored in the vector T{k). The model that is selected 
most often is considered to be the best match : the image represents the model 

for which k = arg max^ T{k). 

Figure 3 shows an example of a vector T{k) in the form of a histogram. Image 
0 is correctly recognised. However, other images have obtained almost equivalent 
scores. 

Multi- dimensional indexing Without indexing the complexity of the voting al- 
gorithm is of the order oi I x N where I is the number of features in the query 
image and N the total number of features in the data base. As N is large (about 
150,000 in our tests) an indexing technique needs to be used. 

Our search structure is a variant of fc-d trees. Each dimension of the space 
is considered sequentially. Access to a value in one dimension is made through 
fixed size 1-dimensional buckets. Corresponding buckets and their neighbours 
can be directly accessed. Accessing neighbours is necessary to take into account 
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Fig. 3. Result of the voting algorithm : the number of votes are displayed for 
each of the 100 model images. Image 0 is recognised correctly. 



uncertainty. The complexity of such an indexing is of the order of 1 (number of 
features of the query image). 

This indexing technique leads to a very efficient recognition. The mean re- 
trieval time for our database containing 1020 objects (see figure 6) is less than 
5 seconds on a UltraSparc 30. 



2.4 Semi-local Constraints 

Having a large number of models or many very similar ones raises the probability 
that a feature will vote for several models. We therefore add the use of local shape 
configurations (see figure 4). 




a database entry and a match 

its p closest features 



Fig. 4. Semi-local constraints : neighbours of the point have to match and angles 
have to correspond. Note that not all neighbours have to be matched correctly. 

For each feature (interest point) in the database, the p closest features in the 
image are selected. If we require that all p closest neighbours are matched cor- 
rectly, we suppose that there is no miss-detection of points. Therefore, we require 
that at least 50% of the neighbours match. In order to increase the recognition 
rate further, geometric constraints are added. As we suppose that the transfor- 
mation can be locally approximated by a similarity transformation, angles and 
length ratios of the semi-local shape configurations have to be consistent. 

An example using the geometrical coherence and the semi-local constraints 
is displayed in figure 5. It gives the votes if semi-local constraints are applied to 
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the example in figure 3. The score of the object to be recognised is now much 
more distinctive. 



30 
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0 

Fig. 5. Result of applying semi-local constraints : the number of votes are dis- 
played for each model image. Semi-local constraints decrease the probability of 
false votes. Image 0 is recognised much more distinctively than in figure 3. 
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2.5 Experimental Results 

Experiments have been conducted for an image database containing 1020 im- 
ages. They have shown the robustness of the method to image rotation, scale 
change, small viewpoint variations, partial visibility and extraneous features. 
The obtained recognition rate is above 99% for a variety of test images taken 
under different conditions. 

Content of the database The database includes different kinds of images such 
as 200 paintings, 100 aerial images and 720 images of 3D objects (see figure 6). 
3D objects include the Columbia database. These images are of a wide variety. 
However, some of the painting images and some of the aerial images are very 
similar. This leads to ambiguities which the recognition method is capable of 
dealing with. 




Fig. 6. Some images of the database. The database contains more 1020 images. 
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In the case of a planar 2D object, an object is represented by one image in 
the database. This is also the case for nearly planar objects as for aerial images. 
A 3D object has to be represented by images taken from different viewpoints. 
Images are stored in the database with 20 degrees viewpoint changes. 



Recognition results Three examples are now given, one for each type of image. 
For all of them, the image on the right is stored in the database. It is correctly 
retrieved using any of the images on the left. Figure 7 shows recognition of a 
painting image in the case of image rotation and scale change. It also shows that 
correct recognition is possible if only part of an image is given. 




Fig. 7. The image on the right is correctly retrieved using any of the images on 
the left. Images are rotated, scaled and only part of the image is given. 



In figure 8 an example of an aerial image is displayed. It shows correct re- 
trieval in the case of image rotation and if part of an image is used. In the case 
of aerial images we also have to deal with a change in viewpoint and extraneous 
features. Notice that buildings appear differently because viewing angles have 
changed and cars have moved. Figure 9 shows recognition of a 3D object. 




Fig. 8. The image on the right is correctly retrieved using any of the images on 
the left. Images are seen from a different viewpoint (courtesy of Istar). 
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Fig. 9. On the left the image used for retrieval and on the right the retrieved 
image. Matched interest points are displayed. 

3 Curve Matching 

In this section we review a method for line and curve matching between two per- 
spective images of a 3D scene acquired from different viewpoints. It is assumed 
that 3D effects can not be ignored, and that the fundamental matrix, F, for the 
image pair is available. We return to how the fundamental matrix is obtained 
in section 4, where it is shown that the number of matched curves provides a 
ranking score in image retrieval. 

Previous criteria for stereo curve matching have included epipolar and order- 
ing constraints, figural continuity [12], variation in disparity [23], and consistency 
of curve groups [3, 7]. The method reviewed here, which originally appeared 
in [16, 17], is to supplement such geometric constraints by photometric con- 
straints on the intensity neighbourhood of the curve. In particular the similarity 
of the curves is assessed by cross-correlation of the curve intensity neighbour- 
hoods at corresponding points. This is described in more detail in the following 
section. 

We will describe two algorithms: the first is applicable to nearby views; the 
second to wide baselines where account must be taken of the viewpoint change. 
The algorithms are robust to deficiencies in the curve segment extraction and 
partial occlusion. Experimental results are given for image pairs with varying 
motions between views. 

3.1 Basic Curve Matching Algorithm 

We suppose that we have obtained lines and curves in each image. The task 
is then to determine which lines/curves, if any, match. The problem is non- 
trivial because of the usual problems of fragmentation due to over and under 
segmentation. The algorithm proceeds by computing a pair-wise similarity score 
between each curve (or line) in the first image, and each curve (or line) in the 
second. The matches are decided by a winner takes all scheme based on the 
similarity scores. 
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The photometric information is employed in computing the similarity score. 
Consider two possibly corresponding curves c and c' in the first and second 
images respectively. The curves are corresponding if they are images of the same 
3D curve. If they are corresponding, then a point to point correspondence on 
the curves may be determined using the epipolar geometry : for an image point 
X on the curve c, the epipolar line in the second image is Ig = Fx, and this line 
intersects the curve c' in the point x' corresponding to x, i.e. x and x' are images 
of the same 3D point. Consequently, the image intensity neighbourhoods of x 
and x' should be similar. Then the similarity score for c and c' is determined by 
averaging the similarity of neighbourhoods for all corresponding points on the 
curves. The similarity of neighbourhoods is determined by cross-correlation. 

If the curves are indeed corresponding, then the similarity score will be high 
— certainly in general it will be higher than the score for images of two different 
3D curves. This is the basis of the winner takes all allocation of curve matches. 



Matching performance The algorithm is demonstrated here on the two image 
pairs shown in figures 10 and 12. The ground-truth matches are assessed by 
hand. 




frame 11 



frame 15 



frame 19 



Fig. 10. The “bottle” sequence. Frames are selected from this sequence to form 
image pairs. The camera motion between the frames is fairly uniform, so that 
the frame number is a good indicator of the distance between views. 



Figure 11 shows a typical matching result for two bottle images (frame 11 and 
15). Only the parts of the matched contours for which there are corresponding 
edgels in both views are shown. This excludes the parts of the chains along 
epipolar lines (where one-to-one point correspondences are not available), and 
also those parts of the chain which are detected as edgels in one view but not 
in the other. Only corresponding parts are shown for the rest of the examples in 
this paper. 

The performance of the algorithm depends on the number and quality of the 
curves detected in each image. However, as shown in table 1, over a 100% vari- 
ation in the curve segmentation parameters the algorithm performs extremely 
well. The two parameters are the minimum intensity gradient at which edgels 
are included in the linked contour — a high value excludes weak edges; and the 
minimum number of edgels in the linked chain — a high value excludes short 
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Fig. 11. Short baseline matching for frames 11 and 15 of the bottle sequence. 
Upper pair : The curves which are input to the matching algorithm. The contours 
extracted are with a gradient threshold of 60 and a length threshold of 60 pixels. 
There are 37 and 47 contours in the left and right images respectively. Lower 
pair : The 29 contours matched by the algorithm, showing only the parts which 
have corresponding edgels in both views. 97% of the 29 matches are correct. 



chains. Most of the mismatches may be attributed to specularities on the bottle. 
Curves arising from specularities can be removed by a pre-process. 

Figure 12 shows matched line segments for aerial images of an urban scene. 
248 and 236 line segments are obtained for the left and right images, respectively, 
122 of the lines are matched, and 97.5% of the 122 matches obtained are correct. 

It is evident from these examples that for all choices of the parameters shown 
a large proportion (> 80%) of the potential line/curve matches are successfully 
obtained. 



3.2 Wide Base Line Matching Algorithm 

If there is a significant rotation of the camera or a wide baseline between views, 
then the simple correlation of image intensities employed above will fail as a 
measure of the similarity of the neighbourhoods of corresponding image points 
on the curve. Think of a camera motion consisting of a translation parallel to 
the image x-axis, followed by a 90° rotation about the camera principal axis (i.e. 
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Table 1. Edge detection parameters and curve matching results for the short 
baseline algorithm applied to frames 11 and 15 of the “bottle” sequence. For the 
60/60 case there is one false match, for the 60/30 case there is one false match, 
the other three are due to specularities; and for the 30/30 case there are two 
false matches, the other four are due to specularities. 




Fig. 12. Upper pair: Two aerial views of a building acquired from different view- 
points. Lower pair: Matched segments using the short range motion algorithm. 
97.5% of the 122 matches shown are correct. 
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a rotation axis perpendicular to the image plane). The cross-correlation of the 
neighbourhoods will be very low if the rotation is not corrected for. 

Suppose the 3D curve lies on a surface, then the rotation, and in general all 
perspective distortion, can be corrected for, if the cross-correlation is computed 
as follows: for each point in the intensity neighbourhood of x in the first image, 
compute the intersection of the back projected ray with the surface, then de- 
termine the image of this intersection in the second view. The surface defines a 
mapping between the neighbourhoods of x and x', and the cross-correlation is 
computed over points related by this map. We don’t know the surface, and so 
don’t know this map, but a very good approximation to the map is provided by 
the homography induced by the tangent plane of the surface at the 3D point of 
interest. 

In the case of line matching [16] this homography can only be determined 
up to a one parameter family because a line in 3D only determines the plane 
inducing the homography up to a one parameter family. This means that for 
lines a one dimensional search over homographies is required. However, in the 
case of curve matching, the curve osculating plane provides a homography that 
may be used, and no search over homographies is required. 




Fig. 13. The osculating plane of a (non-planar) curve varies, but is always de- 
fined in 3-space provided the curvature is not zero. This plane is determined 
uniquely from the image of the curve in two views. The plane induces a homog- 
raphy between the images. 



In more detail suppose a plane curve is imaged in two views, as illustrated in 
figure 13, then given the tangent lines and curvatures at corresponding points, 
X ^ x', of the curves in each image, and the fundamental matrix between 
views; the the homography H induced by the osculating plane may be computed 
uniquely [17]. 

An example of wide baseline matching for frames 11 and 19 of the “bottle” 
sequence (cf. figure 10) is shown in figure 14. Of 37 and 48 curves in the left and 
right images, respectively, 16 are matched, and 14 of these matches are correct. 
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Fig. 14. Wide base line matching for frames II and 19 of the bottle sequence. 
88% of the 16 matched contour chains are correct. 



4 Image Matching Using Curve Verification 

It has been shown in section 2 that given a query image, a set of possible match- 
ing images can be retrieved from an image data base. This retrieval is efficient 
because it is based on indexing of interest points invariants. It then remains 
to determine which images in the set of possible matches do indeed match the 
query image, and to rank the matching images. We show in this section that 
curve matching may be used to verify image matches and also provide a ranking. 
These verification tests require a multi-view relation (such as a planar projective 
transformation or fundamental matrix) and the point correspondences provide 
this. 

Suppose the query and database images are views of a 3D scene acquired from 
different points. A first verification test is to determine if the interest point cor- 
respondences satisfy epipolar geometry constraints. This is equivalent to seeing 
if a large proportion of the matches are consistent with a fundamental matrix. 
Robust methods are now well established for simultaneously computing the fun- 
damental matrix and a set of consistent point matches, from a set of putative 
point matches, many of which may be incorrect [19, 20, 24]. It can be a weak test 
as correspondences can accidentally line up with epipolar lines, and if there are 
a limited number of interest points it is always possible to obtain a consistent 
solution for the fundamental matrix. 

If the images pass the first verification test then a fundamental matrix is 
available between the query and database image. A second verification test is 
then to use the line/curve matcher described in section 3, to see if a large pro- 
portion of the lines and curves match. The retrieved images may then be ranked 
by this proportion. A higher number indicating a greater overlap in viewpoints 
of the 3D scene. 

These two verification steps are demonstrated in the following example. For 
the query image on the left of figure 15 there are 11 images in the database with 
more than 7 interest point correspondences (the minimum number required to 
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compute the fundamental matrix). These images are determined by indexing 
on local intensity invariants and semi-local geometric constraints. The database 
images with the highest and next highest voting scores are shown on the right 
of figure 15 and in figure 16. The match with highest vote is actually correct, 
whilst the other is incorrect. Both images pass the first verification test, so 
a fundamental matrix is available between the query image and each of the 
database images. The curve matcher of seetion 3 is then applied to each image 
pair. In the case of the match with highest vote, 802 curve edgels are matched 
(see flgurelT). In the case of the match with the second highest vote no edgels 
are matched. The correct match is therefore very clearly identified. 

Computing the fundamental matrix as a means of object recognition has 
been proposed before by Xu and Zhang [22] amongst others. However, com- 
bining the fundamental matrix with the additional geometric and photometric 
constraints provided by curves delivers a powerful image matcher: it has the 
indexing advantage of interest points combined with the verification strength of 
curves. 




Fig. 15. Left: the query image. Right: the best match using intensity invariants. 
Interest points used during the matching process are displayed. 



5 Discussion and Extensions 

The interplay between geometric and photometric constraints has been illus- 
trated at a number of points throughout this paper. 

First, it has been shown that under image similarity transformations point 
correspondences can be established between two images simply by employing the 
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Fig. 16. The second best match using intensity invariants - which is in fact 
incorrect. Matched interest points are displayed. This match is rejected by the 
curve verification test. 




Fig. 17. Verification test : Matched edges for the image pair in figure 15. 802 
edgels have been matched. 
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discriminance of local intensity patterns. However, blindly voting on individual 
invariants is not sufficient to guarantee the correctness of the image match in 
database indexing. It is crucial to introduce further geometric constraints in the 
form of semi-local coherence on the point patterns. 

Second, it has been shown that although the matching of 3D curves between 
two images would appear to be very ambiguous - since a point on a curve in one 
image could potentially match with any of the points at which its epipolar line 
intersects curves in the other image - the introduction of photometric constraints 
on the curve neighbourhoods virtually eliminates the ambiguity. 

These two techniques have been combined for the application of matching 
images of a 3D scene from a large set of images acquired from different view- 
points. Curves, more than points, capture the structure of the scene, and the 
number of curve matches may be used to rank the image matches. Indeed an 
extension of this technique would be to detect change between the images (such 
as the addition or removal of a building [8]) by the spatial arrangement of the 
unmatched curves and lines. 

In the context of image retrieval 3D effects can often be ignored, as a scene 
may be planar or 3D effects are not significant. The map between images is then 
a simple planar homography (projective transformation). This homography may 
be computed from the interest point correspondences. Curve matching using 
both geometry and photometric information can then proceed in much the same 
manner as that of section 3, with the homography providing the curve point 
correspondences. 
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^ The authors are grateful to Dani Lischinski of the Hebrew University of Jerusalem 
for providing the triangulation code. 
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Abstract. We introduce a new algorithm for identifying objects in clut- 
tered images, based on approximate subgraph matching. This algorithm 
is robust under moderate variations in the camera viewpoints. In other 
words, it is expected to recognize an object (whose model is derived 
from a template image) in a search image, even when the cameras of the 
template and search images are substantially different. The algorithm 
represents the objects in the template and search images by weighted 
adjacency graphs. Then the problem of recognizing the template object 
in the search image is reduced to the problem of approximately match- 
ing the template graph as a subgraph of the search image graph. The 
matching procedure is somewhat insensitive to minor graph variations, 
thus leading to a recognition algorithm which is robust with respect to 
camera variations. 
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i t th oun a i (that i ) h poth iz to n lo th ion a 

t t . a t t p in thi p o th in th ima a t t u in 

a ann - t 1 t to an lin m nt a tt to th ultin 1 . 

t i a ona 1 to a um that th ion oun a i pa th ou h th ult- 

in lin m nt in un ou a umption a a oun a will p o u 
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Fig. 1. Left : The result of edge deteetion in a template image eontaining a cup. 
Right : The result of adjusting lines fitted to the cup edge segmentation. 



a i ontinuit in th int n it an olo va iation o th n lo 
thu how up u in t tion. ow v t pi all th 

t t in th o m o num ou mall o n lin m nt ( 

t i iffi ult to i nti th xa t om t o th n lo a 

th lin m nt . o imp ov th oun a om t i w u 
to u th p o th lin m nt . om o th h u i ti a 



ion an 
oun a i a 
u 1(1 t)). 
i tl om 
om h u i ti 



— M lin that a n a 1 ollin a an within om p oximit th hoi o 

a h oth . hi i u ul in atin a in 1 whi h ma hav o n 

up into va iou mall ( ut n a 1 pa all 1) m nt u in t tion. 

— at - un tion om pai o lin on o whi h n lo to th in i 

o an awa om th n point o th oth . hi i u ul in atin 

int tion o on o lu in o j t . hi will h Ip in o tainin w 11- 

n a on oth o j t . 

— nt t lin who n -point a lo to a h oth an th lin a at 

o tu an 1 . hi at th on o an o j t whi h ma not hav 

n t t u in m ntation. 



i u 1( i ht) how th ult o appl in th h u i ti on th m ntation 
o th ima in u 1(1 t). n ou xp i n th h u i ti ai i ni anti 
in o tin mo t o th n at oun a m nt . 



2.2 Estimating Initial Uniform Regions: Constrained Triangulation 

in th oun a lin m nt om th p viou t p w now n at an 

initial pa tition o th ima into t ian 1 o uni o m int n it an olo . hi i 
a ompli h a constrained triangulation o th oun a lin . on t ain 
t ian ulation p o u a tot ian 1 whi h join n a t point ( n -point o 

th lin ) ut p t th on t ainin oun a lin . hat i a h oun a 
lin m nt will an o om t ian 1 . 
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Fig. 2. Constrained triangulation on the adjusted eup lines. On the left, the 
triangles only; on the right the triangles superimposed on the image. 



in allt ian la o m om th n -point an lin onth oun a i 

o th a a h t ian 1 li ompl t 1 in i a a . Mo ov in th 

t ian 1 ov th whol ima a h a an p nt a union o a 

nit num o th t ian 1 . an xampl th ult o on t ain 

t ian ulation on an ima m ntation in u 2. 



2.3 Extracting Object Faces: Region Merging 

n th n xt t p a ion-m inpo u iu to in m ntall n at 

th vi i 1 o j t a in th ima . ta tin with th t ian ula ion om 
th on t ain t ian ulation n i h o in ion a u iv 1 m i th 

hav at 1 a t on o th ollowin p op ti 

1. Similar color intensities: wo a ja nt ion am i th i n 

tw n th i av a olo int n it v to i 1 than a th hoi . hi 
i a a ona 1 m in p op t in n i h o in a in o j t a at 
an 1 to a h oth an a li 1 to a t ima o i nt int n iti . 

a n m nt o thi m tho on oul m two ion a on a 

i ion o whi h o two h poth (th two ion a pa at ; th two 
ion houl o ma in 1 ion) ip a 1 a onth olo tati ti 

0 th ion . n a ition a lin a o mo ompl x olo a i nt ov a 
a oul mo 1 . h m tho hav n u t in 15 16 ut 

w hav not t i th m t. 

2. Unsupported bridge: wo a ja nt ion am i th p nta o 

ommon tw n th m whi h a unsupported i la than a th h- 

01 . n i ai to uppo t i a p i p nta o it pix 1 

Ion to an 1 t t th t to . M in a on thi p op- 

t will n u th in lu ion o tho oun a m nt whi h w mi in 
om th to lin m nt iv om 1 . hi i mon t at 
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th at that a num o non- on t ainin lin in th t ian ulation n up 
in oun uppo t . 

t a h m th p op ti ( iz an olo ) o th n w la ion 

a omput om th p op ti o th two ion in m 




Fig. 3. Faces of the cup (left) and urban scene (right) extracted by our algorithm. 



n 1 

tw 

th 

th 

th 



a o iat 
alon 
not p 
a o a 
mall 
m in 



in it ation ontinu until th olo int n iti o ah pai o 

ion a uffi i ntl i nt an mo t o th ommon 

m nt ation. n ou a umption a out 
illumination it i a ona 1 to a um that 
ima o th a o o j t pi tu in 
u 3 whi h how th a xt a t u in 



h m 
h o in 

n th m hav uppo t om th 
natu o th o j t an th 
ultin ion a li 1 to 
ima . an illu t ation 
ou al o ithm. 

h ult o th m ntation an m in 
olo ( ) valu . pi all th 

ion oun a i . h a mov 

nt m anin ul a in th ima 

oun a . imila 1 an i ual v 
ion 1 that a out 30 qua pix 1 
in li ht va iation o olo hav p 



al o ithm i a to ion with 

main mall na ow ion 1 in 

om on i ation in th o 
ut a au olo t an it ion 

mall ion a mov . hu 
om tim main at th ion 
V nt th m om in m 



with a ja nt la 



ion . 



3 Deriving Graph Representations of Objects 

On all th o j t a in th ima hav n n at th a p nt 

a a aph. o aptu th lativ pla m nt o th o j t in th ima 
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an th topolo o th n an a ja n aph o th a in th n i 

on t u t . 

a hv t xinth aph p nt a ion an i annotat withth hap 

po ition an olo att i ut o th ion. hap i p nt th mom nt 
mat ix o th ion om whi h on ma iv th a a o th ion alon 
with th o i ntation an atio o th p in ipal ax o th ion. n t th 

ion i in p nt a an Hip . hi hap p ntation i o on 

an xt m 1 ou h p ntation o th hap o th ion. ow v it i al o 
quit o ivin o va iation o hap alon th oun a i o v n a tain 
o am ntation o th ion. in mat hin will not on impl on 
th a i o a ion-to- ion mat h ut ath on mat hin o ion lu t 
thi 1 V 1 o hap p ntation ha p ov n to a quat . Mo p i hap 

timat hav n on i how v . h i u mu t i tat th 

o a u a an p ata ilit o th m ntation p o how v . h 

olo o th ion i p nt an olo v to . 0th p ntation 

a o ou po i 1 an hav n t i oth autho (15 16 ). 

au o th po i ilit o ion in a m nt o ion in im- 
p op 1 m it tu n out to inapp op iat to u in th aph to 

p nt ph i all a ja nt ion . h a ja n aph n at u h a 

ul i too n itiv to mino va iation in th ima m ntation. n t a th 
hoi wa ma o joinin a h v t x to th v ti p ntin th iV lo t 

ion in th m nt ima . valu o N wa ho n. hu a h v t x 
in th aph ha n i h o . 

4 The Three-Tier Matching Method 

h u tion o th ima to an att i ut aph p nt a i ni ant im- 

pli ation. h aph o pon in to a t pi al ompli at ima (th a h 
ima ) ma ontain up to 500 o o v ti wh a th aph o pon in 

to an o j t to oun (th t mplat ) ma ontain 50 v ti o o. hu a 
ompl t on -on-on ompa i on ma a i out in quit a ho t tim . 

h a h i a i out in th pha a ollow 

1. Local comparison. on -to-on ompa i on o ah pai o v ti i 

a i out. a h pai o v ti on om th template graph an on om 

th search graph i a i n a o a on imila it o hap iz an 

olo within ath li al oun . 

2. Neighborhood comparison. h lo al n i h o hoo on i tin o a v - 

t X an it n i h o in th t mplat aph i ompa with a lo al n i h- 

o hoo in th ah aph. o iain toahuhniho hoo 

pai in a on ompati ilit an th in ivi ual v t x-pai o 

3. Global matching. ompl t aph-mat hin al o ithm i a i out 

in whi h p omi in mat h i nti in th ta -2 mat hin a pi 

to th to i nti a pa tial (o optimall a ompl t ) aph mat h. 

a h o th t p will i in mo tail in lat tion . h 

i a hin thi multi- ta mat hin app oa h i to avoi ulin out po i 1 
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mat h at an a 1 ta ma in th mat hin p o o u t to i n in 

th m ntation an vi wpoint. hi app oa h i motivat om th o in 
m tho u in t nni mat h in whi h a th -ti o in t m i u 
am t mat h. t a h ta li ht a vanta a ampli . pla who 

win 55% o point will win 62% o am 2% o t an 95.7% o mat h . 

hu th tt pla will (almo t) alwa win pit t mpo a t a . n 

th am wa th th -ti aph mat hin m tho p ovi a o u t wa o 
onv in to th o t mat h pit lo al flu tuation o ion-to- ion 

o in . 



4.1 Local Matching 

n lo al mat hin in ivi ual v t x pai a valuat . a h pai i a i n 
a o a on hap an olo . all that a h ion i i aliz a an 

Hip . hap a ompa on th a i o th i iz an nt i it . p to a 

a to o 2 i n in iz i allow without i ni ant p nalt . hi allow 

o i nt al in th two ima within a ona 1 oun . 

au o i nt li htin on it ion olo ma i tw n two ima 

h mo t i ni ant han in olo how v i u to a i htn i n . 
o allow o thi olo a no maliz o in ompa . h olo o a 

ion i p nt a v to an v to that i a on tant multipl 

a h 1 to p nt th am olo . 

h o t o a lo al mat h tw n two v ti i not Qocal- 



4.2 Neighborhood Matching 

a h V t X (h all o ) in th aph ha i ht n i h o p ntin th 

i ht lo t ion . n ompa in th lo al n i h o hoo o on o v t x 
Vo with th lo al n i h o hoo o a pot ntial mat h Vq an att mpt i ma to 
pai th n i h o no o uq with tho o Uq. n thi mat hin th o o th 

n i h o V ti mu t p v . hu 1 t ui, U 2 , ■ ■ ■ , Un th n i h o o 

on o V t X iv n in li an ula o a oun t h o an 1 t Vj^, ... ,v^ 
th n i h o o a pot ntial mat h o imila 1 o . On u t 

A" o th in i an S' o th in i an a on -to-on 

mappin a S ^ S o that th mat hin Vi v , v li o . h 

total o t o a n i h o hoo mat h i qual to 

^nbhd ^oC'jQQg^j(vo, Uq) ’^cr(i)) 

i S 

wh '(Ui i a w i ht tw n 0 an 1 that p n on th atio o i tan 

tw n th o V ti an th n i h o Ui an . o ah pai o o 
V ti uo,Ug th n i h o hoo mat hin that maximiz thi o t un tion i 
p il an fh i ntl oun nami p o ammin . 
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4.3 Graph Matching 



n p viou tion th t mplat ima an th ah ima w u to 

a aph an an i at mat h tw n v ti in th two ima w oun . 

h oal o thi tion i to n at a mutuall on i t nt t o v t x mat h 
tw n th t mplat an th ah ima . n a o iation aph G 2 p o- 
vi a onv ni nt am wo o thi p o . n on i in th a o iation 

aph it i impo tant not to on u it with th ion a ja n aph that 

ha n on i o a . n th a o iation aph v ti p nt pai o 

ion on om a h ima . uhavtx p ntah poth iz mat h- 

in o a ion om th t mplat ima with a ion om th ah ima . 

i ht in th a o iation aph p nt ompati iliti tw n th 

ion mat hin not th two v ti onn t th 
hu a V t X in th a o iation aph i iv n a ou 1 in x an not 

Vij m anin that it p nt a mat h tw n ion Ri in th t mplat ima 

an ion Rj in th ah ima . hi mat h ma not Ri — Rj . 

an xampl i ji — j 2 th n i not ompati 1 with Vij ^ . hi i an v t x 
Vij p nt a mat h Ri — Rj^ an Vij^ p nt th mat h Ri — Rj^ an it 
i impo i 1 that ion Ri houl mat h oth Rj^ an R-^ . hu v ti 

an Vij 2 a in ompati 1 an th i no joinin th two v ti in th 

a o iation aph. h a oth a in whi h mat h a in ompati 1 . o 

in tan on i a v t x p ntin a mat \i Ri — Rj an a v t x Vki 
p ntin a mat h — R^. ion Ri an Rk a lo to th in th 
t mplat ima wh a i? ■ an R, a a apa t in th ah ima th n th 

mat h Ri — Rj an Rk — Ri a in ompati 1 an oth i no joinin 

th V ti Vki an Vij. Mat h ma al o in ompati 1 on th oun o 

o i ntation o olo . 

o mall th a o iation aph G -V, E— i ompo o a t o v ti 

Vana towiht E — V — V. ahvtxw p ntapoil 

mat h tw n a t mplat ion an a a h ion. th a t mplat 

ion an M ah ion th n V woul hav NM v ti ( u 4). n 
o to u th ompl xit o th p o 1 m th aph G i p un o that 
onl th top 5 a i nm nt o a h t mplat ion a in in in V. h 

no a la 1 Vij whi h i int p t at th jth po i 1 a i nm nt o th 

ith t mplat ion. la no o a h t mplat ion i in t into th 

aph. h la no vm p nt th po i ilit o th LL a i nm nt o 

th ith t mplat ion that i no mat hin ion xi t in th oth ima . 

an e (vij,Vki) xi t th n th a i nm nt tw n no Vij an 

Vki a on i ompati 1 . h w i ht o th a iv om th 

ompati ilit mat ix C whi hi n a 



C(ij)(^ki) 



'0 i j Oo I 0 

0 i i k an j — I 

< 0 to 1 i (i, j) {k, 1) 

0 to 1 i Vij an vki a ompati 1 

—N i V ti Vij an Vki a not ompati 1 
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h i th num o t mplat ion . h valu o p nt 

th o iv n to th in ivi ual a i nm nt n no Vij. u aph 

o G p nt a olution to th mat hin p o 1 m. h hoi o w i ht A" 
o an in ompati 1 mat h i to i iminat a ain t in ompati 1 mat h an 
ma tain that a to with maximum w i ht p nt a liqu o 

ompati 1 mat h . 



Template Reference Association 

Objects Objects Graph 




Fig. 4. The template and search images are reduced to a set of regions. Each 
possible pair of assignments are assigned to a node in the association graph. 
Edges in the graph conneet compatible assignments. 



h m tho o t minin ompati ilit an a i nin ompati ilit o 
C'(ij)(fcz) o ompati 1 mat h i a ollow . on i a an i at ion pai 

Ri — Ry h lo al n i h o hoo o ion Ri ha n mat h with n i h- 

o hoo Rj u in th n i h o hoo mat hin ta . n oin thi a to 

n i h o o th ion Ri hav n mat h with th n i h o o th ion 
Ry hi mat hin ma on i a a o pon no v al ion 

(au tothniho o Ri) with an qual num o ion in th oth 

aph. om th o pon n a p oj tiv t an o mation i omput that 

map th nt oi o Ri to th nt oi o R^ whil at th am tim anal a 
po i 1 mappin th n i h o in ion o Ri to th i pai n i h o o Rj. 

hu th n i h o hoo o pon n i mo 1 a lo 1 a po i 1 a 

p oj tiv t an o mation o th ima . L t H th p oj tiv t an o mation 
o omput . 

ow 1 t — Ri anoth an i at ion mat h. o how w 11 thi i 
ompati 1 with th mat h Ri — R^ ih p oj tiv t an o mation H i appli 
to th ion to howw \\H{Rk) o pon with/f^. am a u o thi 
o pon n th V to om RjRi i ompa with th v to RjH{Rk). 

hi i illu t at in u 5. ompati ilit o i a i n a on th 
an 1 an 1 n th i n tw n th two v to . h two a i nm nt a 



m in ompati 1 i th an 1 tw n th two v to 
o th i 1 n th atio x 2. 



45 



X 
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olo ompati ilit o i al o n . h o pon n o a o 

V t X an it n i h o with th mat h on u ation in th oth ima an 

u to n an affin t an o mation o olo pa om th on ima to 

th oth . n afhn olo t an o mation i a uita 1 mo 1 o olo va ia ilit 

un i nt li htin on ition ( 1 )■ h afhn t an o mation n o 
on mat h no pai in to t min wh th anoth mat h no pai 
i ompati 1 . 

h nal ompati ilit o i omput a 

C{ij)(kl) C’nbhd('bi) - C’nbhd(^.0 - nl ompati ilit o - 

1 n th atio ompati ilit o — olo ompati ilit o 




Fig. 5. Compatibility of two matches is determined by applying the transfor- 
mation H defined by the neighbors of the first pair [Ri,R-) to the region Rk 
belonging to the second pair. The positions of HRk and R^ relative to Rj are 
compared. 



4.4 Solution Criteria 

h ou h t an o m o mat h It in app oa h a um t hat a lo al t an - 

o mation n a lativ 1 mall t o pa am t an u to map 

th t mplat ion onto th ah ion . h la t t o no in V 

whi hi on i t nt with a pa ti ula t an o mation woul th n on titut a 

nal olution. ow v ju t an two no a on i t nt with a pa ti u- 
la t an o mation o not n a il impl that th two no a on i t nt 
with a h oth . o in tan in th a o iation aph o 4 a mat h (c, 4) i 

ompati 1 with (6,1) an (6,1) i ompati 1 with (c, 3). ow v (c, 3) i not 

ompati 1 with (c, 4) in c an not imultan ou 1 mat h with oth 3 

an 4. 

popula aphi al app oa h whi h an ta a vanta o om o th in- 

o mation ontain in th t u tu i a no lu t in t hniqu wh a 
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impl pth t a h i u to t min th la t onn t u aph o 

G. onn t aph i on in whi h a path o xi t tw n v pai 

o no in th aph. hi olution p nt a tain amount o on i t n . 

ow V a o th tat m nt that no a i on i t nt with no b an no 
b i on i t nt with no c o not n a il impl that no a i on i t nt 

with no c. hi 1 a to th on In ion that in o to ta nil a vanta 

0 th mutual on t aint m in th a o iation aph th nal olution 

houl p nt a liqu on G. 

u t R — V i a liqu on G i Vij,Vki — R impli that {vij,Vki) — E. 
h a h o a maximum liqu i nown to an NP ompl t p o 1 m 

14 . V n a t p unin th omputational o t a o iat with xhau tiv 

t hniqu u h a 1 woul p ohi itiv . t ha n po t 3 that 

t minin a maximum liqu i analo ou to n in th lo al maximum o 

a ina qua ati un tion. utho u h a 20 2 hav ta n a vanta o 

thi i a u in taxation an n u al n two m tho to app oximat th 

lo al maximum o a qua ati un tion wh thi maximum o pon to 

th la t liqu in th a o iation aph. Ithou h th la t liqu whi h 

1 a on th in o mation ontain inE nu ahihlvlo mutual on- 

i t n th nuan o th ompati ilit m a u in C a lot.no to 

ta a vanta o th ontinuou natu o th t n th a qua ati 

o mula i p i wh th lo al maximum o pon to th liqu that 

ha th maximum um o int nal t n th . n app oa h a on ol 

an an a jan a ual a i nm nt al o ithm ( ) i u to timat th 

optimal olution. h i an it ativ optimization al o ithm whi h t at 

th p o 1 m a a ontinuou p o ut onv to a i t olution. v n 
thou h th olution mi ht n at a on a lo al maximum thi olution 
will ua ant to a maximal liqu . maximal liqu i on that i not 

a p op u t o an oth liqu . 

4.5 Binary Quadratic Formulation 

ina olution olumn v to m i n u h that i mij 1 th n Vij i pa t 

o th nal olution an i mij 0 th n Vij i x lu om th nal olution. 

th la no ViQ i pa t o th nal olution th n th t mplat ion i ha 

no a i nm nt. h olumn an ow o pon in to th la no in th 

ompati ilit mat ix a 11 with 0 nt i . om a aph th o point o vi w 

th la no a onn t to all oth no with z o w i ht. 

h ina qua ati o mula F(m) i n a 

F{m) m Cm (1) 

wh C i th ompati ilit mat ix n in tion 4.3. no to n u 
that a h t mplat ion an mapp to at mo t on ah ion th nal 
olution i on t ain u h that 

6 

mij 1 o all i . 
i=i 



( 2 ) 
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olution o pon in to a lo al maximum o F(m) p nt a to 
a i nm nt with th la t amount o mutual ompati ilit . n maximum o 
F(m) ( lo al o lo al) p nt a maximal liqu on G. o how thi on i a 
pa ti ula olution m wh th xi t u h that rriij 1 an ruki 1 

ut that th no Vij an Vki a in ompati 1 a i nm nt . 1 a 1 thi i th 

onl on ition n a o th olution m not to quali a a liqu . on 
olution m i int o u th am a rn x pt that fhij 0 an yrqo 1 whi h 
m an that ion i ha no a i nm nt. in th nition o C ( tion 

4.3) an quation 1 an 2 it an hown that th i n tw n F(m) 
an F(m) i 

N 6 

F(m) - F(m) 0 - 2 ^ - 2{N - {N - 1)) 2 (3) 

q—1 r=l 

h o F(m) o not p nt a maximum whi h m an that onl a liqu 
on G an n at a maximum on F(m). h n xt t p i to n a olution 
whi h i a maximum o F(m). 



4.6 Approximating the Clique with the Largest Degree of Mutual 
Compatibility 



p viou 1 tat th a h o lo al maximum o 0-1 qua ati quation 

i nown to an NP ompl t p o 1 m o that an app oximat olution to 

th optimum valu o F(m) will hav to timat . h i a u iv 

outin u to olv a n al a i nm nt p o 1 m un th on t aint that 

a i nm nt mu t on to on . n ina qua ati o t un tion an 

u to iv th optimization p o . h n n atin th ompati ilit 

mat ix two no Vij an Vki a on i in ompati 1 i th map t mplat 
ion i an k to th am a h ion. n lu ion o Vij an Vki in th nal 

olution woul ont a i t th tat m nt that a nal olution i ua ant to 

a maximal liqu . hi m an that th po tion o th that p v nt a 

man to on on ition om o u in n not impl m nt 

nitiall m i t at a a ontinuou v to . v al on t aint a pla 

on th optimization p o 

-jj m,ij - 0 (4) 



6 

1- 

i=i 

u in a h it ation t th up at ul o th i a ollow 



( 5 ) 



/3 

e 



Srriij (t) 



E' 



6 J 



iF(t) 



rriij {t 1) 



( 6 ) 
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wh /3 i a po itiv num an 

^F(t) 

J '' ' p—1 q—1 

h up at quation 6 n u that on ition 4 an 5 a maintain . nitiall 
/3 i t to a low vain o that multipl olution an o xi t. h valu o f3 
i a uall in a . an n om quation 6 a /? om la th 

valu o m a o to i t valu o 0 o 1. 

i u 6 how an xampl o th optimization p o . qu n o nap- 

hot aphi all i pla th volution o th olution v to oat mplat 

ima o 15 ion . t th t initial it ation th LL a i nm nt a 
avo au o th in on i t n i tw n ival olution . tw n tim 1 

an tim 3 a ominant olution in to m . h olution i n u in 

tim 4 an tim 5. t tim 6 th al o ithm ha onv to a nal olution an 
tim 7 th o Si i nt hav ta n on ina valu . 

5 Results 

h al o ithm wa t i on v al t o olo ima . h t xampl wa 

a omput manual hown in u 7. h manual wa a il oun in i nt 
ima o a lutt ta 1 -top v n wh n th manual wa pa tiall o lu 

ot that a on manual hown in th ima i not oun in it i a tuall 

a i nt olo thou h thi i not o viou om th - al ima hown in 
th pap . 0th xampl a hown in u 10 an 9. 

6 Conclusion 

h amal amation o ion m ntation al o ithm with mo n olo on- 

tan m tho iv th po i ilit o imp ov o j t o nit ion in olo an 

multi- p t al ima . h a option o an in xa t aph-mat hin app oa h 

ma o nition in p n nt o mo at li htin an vi w-point han 

h aph mat hin app oa h wa a 1 to n at olution with on i t n 
at multipl 1 v 1 . h ion a ja n aph w a 1 to hi hli ht ima to 

t mplat o pon n with t on lo al uppo t. in i tin that th nal 

olution mu t p nt a liqu on th a o iation aph lo al on i t n 

wa a hi V . Ithou h th maximum liqu p o 1 m i ompl t it wa 

mon t at that t on maximal liqu an n at u in a va iation o 
th a ual a i nm nt al o ithm. 
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Evolution of Decision Vector 
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Fig. 6. Illustration of the GAA optimization process. The coefficients for the 
solution vector m are shown at various points in time. Each row represents 
the coefficients corresponding to a particular template region. The last column 
at each time represent the coefficients for the NULL assignments. Initially the 
coefficients take on continuous values between 0 and 1. By the end of the process 
only binary values exist. 
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Image ROI 



Fig. 7. The computer manual used as a template 



Fig. 8. Two examples of recognition. On the left the search image, and on the 
right the outlines of the regions matched against the template. 
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Fig. 9. Recognition of cup image. On the left is the template, in the center the 
seareh image and on the right the identified regions of the located cup. Note that 
the cup in the search image is seen from a different angle from the template 
image. The letters REG are visible in the template, but only RE is visible in the 
search image. 




Fig. 10. Recognizing a building. On the left the template, and on the right the 
seareh image showing the recognized building. 
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1 Introduction 
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int t with th nvi onm nt. n o m tion- u ion t t gi 4 n mpl 
o uit 1 go 1 -o i nt pp o h 9 . Th omput tion n iv n y 
ompl m nt y in o m tion ou n it m y volv on th i o ptiv 
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- Early vision, (DST); 

j t t tion (OBD), realized by two co-operating agents (segmentation 
(Snake) and feature extraction (OST)) 
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Fig. 1. Th h m o th o j t ognition y t m. 



Th p opo ttgyh nhon u thp omn o n 
ti i 1 vi u 1 y t m i t ongly influ n y th in o m tion p o ing th t 
i on u ing th ly vi ion ph to reduce th hug mount o in o m - 
tion oil t y th vi u 1 y t m ( oth n tu 1 n ti i 1) n to voi th 
oil p t high omput tion 1 v 1 ( T ot o 14 ). 

T i m n 1 ugg t th t p - tt ntiv p o ing How to o u pi to i 1 

omput tion on gion o int t. n t meaningful im g n h p tu 

(.g. g mil ink nttu) It u ing thi ph . Thi 

p ility i un m nt 1 p t o n tiv vi ion y t m ( loimono 1 ). 

tt ntion i hi v t i nt 1 v 1 o t tion t ting om ly vi ion. 

t h 1 v 1 i nt p igm o omput tion n on i ( om lo 1 
to ym oli ) in lu ing t t mo ling ( own 2 ). 

t ting ou pp o h on ompl n n ly i y in lu ing it in 

th vi u 1 y t m o th o ot y t m 3 . ig.2 how typi 1 1 

wo 1 n t t y th vi u 1 y t m. t p nt k with 
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V 1 typi 1 o j t . Th n h ow o o j t n th on ition o 

illumin tion i n tu 1. 




Fig. 2. n mpl o ompl n with o j t on k 

Th p p i o g niz ollow . tion 2 i th ly vi ion t k. 

Th o j t t tion mo ul i i in tion 3. tion 4 i it to 
th 1 i ign. p im nt 1 ult n i u ion giv n in tion . 

2 The Early Vision Phase 
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0 ~ 9i,j — G — 1. 
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howing symmetries. 1 v n o ymm t y in vi ion w 1 y not y P y- 
hologi t hoi n 11 h in 11 . 

ymm t yop to h v n in lu invi ion y t m top o m i nt 
vi u 1 t k . o mpl th y h V n ppli to p nt n i 

o j t-p t ( lly 10 ) n to p o m im g gm nt tion in u h 7 . n 

i 1 13 m u o ymm t y i int o u to t t point o int t 

in n . 

Th i t ymm t y T n o m (DST) o D h i in 

y th p o u t o two lo 1 op to 

DSTij = Fij — Eij 
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Th t op to i un tion o th i 1 mom nt omput in win ow Ck 
o lin iz 2 /c + 1 n nt in (i,j) 



rph _ Y^r=+k 
-^i,j Z-^r= — k 



E s—-\-k 
s= — k 



r - sin{^) - s- cos(^)| - g, 



with h = 0, 1, 2, n — 1 wh n i th num o ymm t y u . Th 
un tion F p n on th kin o ymm t y to t t . o mpl in 

o nnul ymm t y 

n In 

Th on op to w igh F o ing to th lo 1 moothn o th im g 
n it i n 




Cfc,(r,s) Ck+i \3l,rn 9r,s\ 

wh pi 1 {l,m) n (r, s) mu t 4- onn t ((Z — r)^ + (m — s)^ = 1). t 

i y to th t Eij = 0 i th im g i lo lly fl t. 

n ig.3 th DST o th n in ig.2 i hown. ight pi 1 nt o 

high lo 1 i ul ymm t y. 




Fig. 3. Th DST o th n in ig.2. 
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Th in i to {gs n ug) th n u to v lu t th 1 tion o th 
o int t y m n o th ul 



T{DST{D),ij.s,a)s,a) 



DST{D) i DST[D) > p .5 + a — us 
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wh Of — 0 (in ou p im nt a = 3). ig.4 how th ult o th 1 tion 
ul on th t n o m im g in ig.3. 




Fig. 4. oint o int tit in th n in ig.2. 



3 Object Detection 

Th OBD mo ul p o m th t tion o th zon ont ining n i t 

o j t o mpl in ig. th u ik’ u i t t . Th v lu in th 

t t zon th n u to omput nit 1 o j t ipto . 

Th g nt p opo o thi t k qui qu n o m im g y 

o ot m thi qu n p nt i nt vi w oun th o j t. h 
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20 . 

Th Object Symmetry Transform [OST g nt) 6 i th n ppli on h 
2 -vi w. 
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Fig. 5. j t t tion mo ul . 
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t mu t point out th t u ing thi ph th OBD i iv n 1 o y 

th n mo 1 knowl g th t i uilt y t king in ount th p i 

go 1 (in ou look t th o j t on th k ). 

3.1 Snakes and Segmentation 
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3.2 Object Axial Symmetries and Features Extraction 
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ot th t th ho oh p n on th influ n th t i giv n to th i - 

t n d tw n th pi 1 n th giv n i . mpl o h- un tion 

) h{p) = d 
) h{p) = l/d 

) ^(P) = 

) h{p) = 

Th Igo ithm to n nit o ymm t y n ily impl m nt 

in th i t . i t o 11 th y nt b o th o j t i t min 

th n th m u MSgiOi) ip o m o 0 = ^ with fc = 0, 1, n — 1. 
M im o MSg 1 o n i t to ymm t y o Oi. 

t i y to th t th omput tion tim o thi t k i 0{n — N'^). 

Th OST omput tion i p o m y omputing th i 1 ymm t y o 
i nt o j t vi w ( ig.6 ). t ollow th t 

OST{k,i){0) = MSk{0,) 

t tu n out th OST i n im g o im n ion n — mtht p nt to 

vi w o 3 -o j t. n th ollowing w will to it OST- p nt tion 
o O. 

ig. how th ult o th ppli tion o th OST to th u ik’ u . 
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Fig. 8. Th OST o th u ik’ u t t in ig.7. 

4 The Object Recognizer 

n th ollowing th 1 i g nt i i . n ou y i n 1 i- 

h n on i . L t ^i,P 2 , •••,PL-th to oj t p ototyp 

1 y P nt y th i OST. Th 1 i ign n unknown o j t x to 

th t 1 p nt y th lo t p ototyp . 

x-Clasi- p{x,pi)~mini k L-p{x,pk~ 

wh p i imil ity un tion. Th imil iti u in ou th normal- 

ized correlation {NC) (p u o-m t i ) th Euelidean {ED) n th Hamming 
{HD) i t n . 11 o th m n in th int v 1 0, 1 • 

ototyp g n t y ynth ti h p (p 11 log m ( ) on 

( ) u ( ) ylin ( ) py mi ( ) Hip oi ( L) ph ( ) 

to u (T ) n pot-lik ( L)) p nting k t h o real o j t on th k 

( u ik’ u ( ) p n-hol ( ) p p -w ight ( ) n mug ( M)). 

n T 1 2 th o pon n tw n p ototyp n 1 o j t i hown thi 
o pon to impli wo 1 mo 1. 

T 1 2 how th 1 i tion ult o th u ik’ u . ot th t 11 

th imil ity un tion ought to o t 1 i tion. 

5 Experiments and Discussion 

Th o j t- ognition ytmh ntt on 1 wo 1 n . ow v 
in thi p limin y p im nt tion th kin oojtinth nh n 
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Object 


Model 


RC 

PH 

PW 

BM 


cube 

cylinder parallelogram ellipsoid 
cone pyramid sphere 
pot — like 



Table 1. j t -mo 1 o pon n . 



P 


PA 


CO 


CU 


CY 


EL 


PY 


SP 


PL 


TO 


NC 


0.14 


0.40 


0.05 


0.20 


0.34 


0.25 


0.42 


0.23 


0.30 


ED 


0.21 


0.16 


0.05 


0.0 


0.12 


0.21 


0.1 


0.1 


0.1 


HD 


0.16 


0.12 


0.03 


0.04 


0.16 


0.1 


0.15 


0.12 


0.13 



Table 2. 1 i tion ult o th u ik’ u 



limit to tho in lu in th o j t t - . T 1 3 umm iz th 1 i- 

tion ult o t in u ing th imil ity un tion with vote t t gy th t 
omput th m n v lu o th th i t n un tion om th p ototyp . t 
i vi nt th t th m imum o imil ity h n h in o pon n 
with th mo 1. ot th t in th po t mpl on T 1 3 th p p -w ight 
i ph i 1 n th p n-hol i ylin i 1. 



Object 


PA 


CO 


CU 


CY 


EL 


PY 


SP 


PL 


TO 


RC 


0.1 


0.22 


0.04 


0.0 


0.13 


0.25 


0.19 


0.1 


0.19 


PH 


0.10 


0.1 


0.11 


0.03 


0.13 


0.21 


0.16 


0.11 


0.19 


PW 


0.26 


0.1 


0.15 


0.15 


0.19 


0.22 


0.08 


0.2 


0.23 


BM 


0.13 


0.21 


0.1 


0.14 


0.15 


0.30 


0.19 


0.08 


0.14 



Table 3. j t 1 i tion. 



n 1 p im nt tion llowing th p n o ny kin o o j t in th 
n houl p o m to t t th o u tn o ou pp o h. v th 1 
th mpl hown in thi p p quit li ti n oul goo t ting 
point o ont ol nvi onm nt y t m. 

Th y t m h n impl m nt in th i t i ut nvi onm nt. 

n oimg o iz 26 — 26 th tim to n lyz ingl 2 -vi w w 

o 3 sec ( om th qui ition o th m to th omput tion o th OST) n 
w msec, to p o m th 1 i tion. Th y t m i 1 to p o m real time 
omput tion in t th qui ition t i o out sec. ( thi tim in lu 

th o ot mov m nt oun th o j t). 

u th wo k will to on i vi w o th o j t i nt om 

o thogon 1 on . ototyp g n tion houl on i mo ophi tit o j t 

mo ling. 
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Abstract. Computer vision often involves the estimation of models of 
the world from visual input. Sometimes it is possible to fit several dif- 
ferent models or hypotheses to a set of data, the choice of exactly which 
model is usually left to the vision practitioner. This paper explores ways 
of automating the model selection process, with specific emphasis on the 
least squares problem, and the handling of implicit or nuisance parame- 
ters (which in this case equate to 3D structure). The statistical literature 
is reviewed and it will become apparent that although no one method 
has yet been developed that will be generally useful for all computer vi- 
sion problems, there do exist some useful partial solutions. This paper is 
intended as a pragmatic beginner’s guide to model selection, highlighting 
the pertinent problems and illustrating them using two view geometry 
determination. 



1 Introduction 

o oti vision h s its sis in g om tri mo ling of th worl , n m ny vision 
Igorithms tt m t to stim t th s g om tri mo Is from r iv t 
su lly only on mo 1 is fitt to th t ut wh t if th t might h v 

ris n from on of s v r 1 ossi 1 mo Is? n this s , th fitting ro ur 

n s to ount for 11 th ot nti 1 mo Is n s 1 t whi h of th s fits th 

t st his is th t sk of ro ust mo 1 s 1 tion whi h, in s it of th m ny 

r nt V lo m nts in th li tion of ro ust fitting m tho s within th fi 1 
of om ut r vision, h s n, y om rison, quit n gl t 

his r r vi ws urr nt st tisti Im tho sin mo Is 1 tion with r s t 
to t rmining th two vi w g om tri r 1 tions from th oint m t h s tw n 

two im g s of s n , g th fun m nt 1 m trix 13, 19 h s r 1 tions 

n us to gui m t hing 52, 60 n th n stim t stru tur 4 or 

s gm nt tion 51 hr r s v r 1 two vi w r 1 tions th t oul s ri n 
im g ir n it is n ss ry to stim t th ty of mo 1 s w 11 s th 
r m t rs of th mo 1 

h r is 1 i out s follows tion 2 s ri s four two vi w motion 

mo Is s w 11 s th ir sso it gr s of fr om tion 3 s ri s th 
m ximum lik lihoo m tho for stim tion h us of just m ximum lik li- 
hoo stim tion will Iw ys 1 to th most g n r 1 mo 1 ing sit s 
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most lik ly hr for s tion 4 intro u s th lik lihoo r tio t st for om r- 

ing two mo Is Ithough not g n r lly us ful for om rison mongst multi 1 

mo Is, its X osition will rovi insight into th f ilings of som of th s oring 
rit ri x 1 in 1 t r tion 5 s ri s th AIC rit rion n som v ri nts 

of it, with s ifi m h sis on th 1 st squ r s fitting ro 1 m t is foun th t 

th AIC onsist ntly ov r stim t s th num r of r m t rs in th mo 1, th 

r son for this is x 1 in n som mo ifi tions r sugg st to om n- 

s t n tion 6 th y si n ro h is t il ; th m in ro 1 m with th 

y si n ro h is th s ifi tion of th riors, n th BIC ( y si n infor- 

m tion rit rion roxim tion is is uss n s tion 7 minimum s ri tion 

1 ngth i s for rovi ing riors r only ri fly tou h on s th y 1 to th 

s m sort of Igorithms s th y si n ro h in lly th n fits of mo 1 
V r ging r outlin in s tion 8 suits r s r throughout th t xt n 
summ riz in s tion 9 h is ussion in s tion 10 ov rs som ommonly 

sk qu stions out mo 1 s 1 tion in om ut r vision, n in th on lusion 

it will s n th t this is f r from solv ro 1 m 

Notation: 3 s n oint roj ts to x, n x = x^ in th first n 

s on im g s, wh r x = (xi X 2 X 3 is homog n ous thr v tor h 

inhomog n ous oor in t of n im g oint is ( =(12 h orr s on- 

n will Iso r r s nt y th v tor m = (x y x y , th s t of 11 

oint m t h s tw n two vi ws will not y Nois fr (tru t 
will not y n un rs or _, stim t s , noisy (i m sur t s 

h ro ility nsity fun tion ( f of giv n is r( is two 

vi w r 1 tion, n r th r m t rs of th t r 1 tion 



2 Putative Motion Models 

his s tion s ri s th ut tiv motion mo Is whi h n onstr in th rigi 

motion of oints tw n two vi ws h motion mo 1 is relation 

s ri y s t of parameters whi h fin on or mor implieit funetional 

relationships g(m = 0 (0 is th z ro v tor tw n th im g oor in t s 
i gi{ ; = 0 

hr r four ty s of r 1 tions s ri in this s tion irstly, th r - 

1 tions n ivi tw n motions for whi h m r osition n stru tur 

n r ov r — 3 r 1 tions; n motions for whi h it nnot; for inst n 

wh n 11 th oints li on 1 n , or th m r rot t s out its o ti ntr — 

2 r 1 tions s on ivision is tw n roj tiv n orthogr hi ( ffin 

vi wing on itions ( mor om 1 t t xonmy is giv n in 57 

u os th t th vi w f tur s ris from 3 o j t whi h h s un rgon 

rot tion n non-z ro tr nsl tion ft r th motion, th s t of homog n ous 

im g oints x^ i = 1 n is tr nsform to th s t x^ h two s ts of 

f tur s r r 1 t y th im li it fun tion 1 r 1 tionshi x^ Fxj = 0 wh r F is 

th r nk2,3 3 13, 19 fun m nt Im trix, thisisr 1 tion 1 h fun m nt 1 

m trix n sul t s th i ol r g om try t ont ins 11 th inform tion on 
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m r motion n int rn 1 m r r m t rs v il 1 from im g f tur 
orr s on n s Ion 

h n th r is g n r y in th t su h th t uniqu solution for F 
nnot tt in it is sir 1 to us sim 1 r motion mo 1 or inst n 

wh n th m r is only rot ting out its o ti ntr hr oth r mo Is r 

onsi r 2, whi h is th fhn m r mo 1 of un y iss rm n 38 

with lin r fun m nt 1 m trix Fa h fhn m r is li 1 wh n th 
t is vi w un r orthogr hi on itions n giv s ris to fun m nt 1 
m trix with z ro s in th u r 2 y 2 su m trix ^ h homogr hy x = Hx 

is r 1 tion 3, n fhnity x = Hax is r 1 tion 4 whi h ris wh n th 

vi w oints 11 li on 1 n or th m r is rot ting out its o ti ntr 

tw n im g s, th homogr hy ing in th roj tiv s , th fhnity in th 

orthogr hi 



Relation, 


c 


k 


d 


Constraint 


Parameters 


n r 


7 


7 


3 


x ^Fx = 0 


/i h fs 
F= / 4 / 5 /e 
/r fs h 


n Fa 


4 


4 


3 


x ^Fax = 0 


0 0 gi 

Fa = 0 0 Q 2 

93 94 , 9 s 


0 0 r p 


4 


8 


2 


x = Hx 


hi /i2 hs 
H = hi h^ he 
hr hs hg 


nit 


3 


6 


2 


x = Hax 


Ol 02 03 
Ha = 04 05 06 

0 0 07 



Table 1 . A description of the reduced models that are fitted to degenerate sets 
of correspondences. is the minimum number of correspondences needed in a 
sample to estimate the constraint, k is the number of parameters in the relation; 
is the dimension of the constraint. 



Model Complexity: ollowing th m xim of m “ ntiti s r not to 

multi li without n ssity” ^ mo 1 s 1 tion ty i lly s or s mo Is y 
ost fun tion th t n liz s mo Is with mor r m t rs t is onv ni nt 

^ Actually Fa occurs in the non-orthographic case when the optical planes of the 
two cameras coincide [ 50 ]. Triggs claims (personal communication) that affine recon- 
struction in this case gives projectively correct results. 

^ In fact Occam did not actually say this, but said something which has much the 
same effect, namely: ‘It is vain to do with more what can be done with fewer’. That 
is to say, if everything in some science (here computer vision) can be interpreted 
without assuming this or that entity, there is no ground for assuming it [ 44 ]. 
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t this oint to intro u th num r of x li it gr s of fr om for th 

r m t rs of h mo 1 h fun m nt 1 m trix h s 7 gr s of fr om, 

th homogr hy h s 8, n y t th fun m nt 1 m trix is mor g n r 1 h 

fhn fun m nt 1 m trix h s 4 gr s of fr om, fhnity 6, g in th ffin 
fun m nt 1 m trix is mor g n r 1; this s ming r ox n r solv y 
onsi ring th im nsion of th mo 1 

n ition to th gr s of fr om in th r m t rs w sh 11 s th t th 

om 1 xity of mo 1 is Iso t rmin y its im nsion, whi h is fin now 

h ir of orr s on ing oints x, x fin s singl oint m in m sur - 

m nt s , form y joining th oor in t s in h im g h s im g 

orr s on ns, whi h r in u y rigi motion, h v n sso it Ig- 
r i V ri ty in h fun m nt 1 m trix, n ffin fun m nt 1 m trix 

for two im g s r im nsion 3 v ri ti s of gr 4 (qu rti n 1 (lin r 

r s tiv ly h homogr hy n ffinity tw n two im g s r im nsion 2 

V ri ti s of gr 2 (qu r ti n 1 (lin r r s tiv ly h ro rti s of 

th r 1 tions r summ riz in t 11 his loos ly s king m ns th t 

th fun m nt 1 m trix s ri s thr im nsion 1 surf in n th 

homogr hy two im nsion 1 surf h oint on this fun m nt 1 m trix 

surf ( ffin or g n r 1 is sim ly in t h tw n two of th im g s, n it 

h s thr gr s of fr om quiv 1 nt to th f t th t it m s to thr i- 

m nsion 1 oint in th s n imil rly h oint on homogr hy (or ffinity 

surf r r s nts m t h with two gr s of fr om 

3 Maximum Likelihood Estimation 

ithin this s tion th m ximum lik lihoo stim t of th r m t rs of 
giv n r 1 tion is non-rigorously riv Ithough this is st n r r suit 

th riv tion will r v 1 th t th r r mor r m t rs to onsi r in 

th mo 1 formul tion th n just th x li it r m t rs of giv n in th 1 st 

s tion n 1 1 h s ition 1 r m t rs r som tim s r f rr to s 

nuis n r m t rs 47 his is im ort nt s 1 t r it will s n th t th 
rior istri ution of is r 1 t to th num r of r m t rs th t n to 

stim t urth rmor riving th m ximum lik lihoo rror from first 

rin i 1 s is us ful x r is s th r is long history of r s r h rs using ad 

hoc rror m sur s to stim t multi 1 vi w r 1 tions whi h r su o tim 1 

L t m V tor of th os rv oor in t s m = (x y x y t is 

ssum th t th nois on m is ussi n m = m + e with ov ri n m - 

trix h ov ri n m trix for s t of orr s on n s, r r s nt y 

st k V tor = (iri]^ , is ^ = i g ( n this r it is 

ssum th t th nois in th lo tion of f tur s in 11 th im g s is ussi n on 
h im g oor in t with z ro m n n uniform st n r vi tion , thus 
= I ^ ( xt nsion to th mor g n r 1 s is not iffi ult n is s ri y 

K n t ni 26 11 th t th tru v lu m of m s tisfi s th im li it fun - 

tion 1 r 1 tionshi s ( g Ig r i olynomi Is =1 for im nsion 3 v ri ti s, 

n =2 for im nsion 2 v ri ti s in two im g s 
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^^(m; =0 i = 1 q (1 

iv n ^ (th 1 st ing th stim t v ri n of th nois , whi h ty - 

i lly n in n ntly riv from th ro rti s of th f tur m t h r 

th ro ility nsity fun tion of s t of o s rv orr s on n s is 

27T " 



wh r g( ; =0 o fin th m ximum lik lihoo solution, th n g tiv log- 

lik lihoo 

- =-log r( - - ^ (3 

is minimiz su j t to th r stri tions of g o om lish this L gr ng mul- 
ti li rs r us , th riv tiv s of 

- =2ulog ^ + ' + ^g("r (4 

with r s t to^ ^28 r qu t to z ro h s qu tions r 47 

nl 1 / ^ / 

2 “ “ “ 2 4 ( ~ ( “ “0 

— =“ +S =0 



= =0 

2 

§(■'; =0 

wh r S n T r th o i ns of th im li it fun tion 1 r 1 tionshi s g( ; 
with r s t to n r s tiv ly S = > T = h 

solution of this s t of qu tions r th m ximum lik lihoo stim t s of th 
nois , th r m t rs for th r 1 tion, n th st fitting orr s on n s 
ssuming is giv n, th num r of fr r m t rs in this syst m is th 
num r of gr s of fr om k in giv n in 11, lus num r of gr s 
of fr om in h orr s on n m o ys th onstr ints giv n y g n 

h n li s in th V ri ty fin y — ; su h th t it is th 1 st squ r s ist n in 
— “ w y from th o s rv m t h m, thus h orr s on n m h s gr s 
of fr om s giv n y 11; thus th num r of gr s of fr om in is 
n (th xtr num r of nuis n r m t rs — Ithough in this ro 1 m th s 

r m t rs r f r from ing nuis n s th y im li itly fin th stim t 

stru tur of th s n The total number of parameters to be estimated (excluding 
) is p = k + n , n th tot 1 num r of o s rv tions is = 4n, oth of whi h 

r im ort nt for th riv tion of onfi n int rv Is in th n xt s tion 

® At this juncture the selection of the functional form of the data — the most appro- 
priate motion model is assumed known. 
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iv n th t = I th n th n g tiv log lik lihoo (3 of 11 th orr s on- 

n s iHi, i = 1 n wh r n is th num r of orr s on ns, is 



-J J \ 


2 




2 




xl -xl j 




— 




m — m 


u ~ y 






i 






(5 



wh r th log lik lihoo of giv nmthis (m — =(^(^> is ounting th 
onst nt t rms (whi h is quiv 1 nt to th r roj tion rror of rtl y n 
turm 20 f th ty of r 1 tion — is known th n, o s rving th t , w n 
stim t th r m t rs of — to minimiz this log lik lihoo his inf r n is 
11 ‘ ximum Lik lihoo stim tion’ ( ish r 1936 14 Num ri 1 m tho s 

for fin ing th s two vi w r 1 tions r giv n in 5, 53, 56 o nfor th on- 
str ints on th r m t rs su h s tF— = 0 s qu nti 1 qu r ti rogr mming 

( 16 is us , st t of th rt m tho for solving onstr in minimiz - 

tion ro 1 ms, whi h h s n foun to out rform m ny oth r m tho s in 
t rms of ffi i n y, ur y, n r nt g of su ssful solutions, ov r 1 rg 
num r of t st ro 1 ms 45 

Robust Estimation h ov riv tion ssum s th t th rrors r us- 

si n, oft n how v r f tur s r mism t h n th rror on m is not us- 
si n hus th rror is mo 1 s mixtur mo 1 of ussi n n uniform 
istri ution - 

, ( 1 . ^ , If 

r( = W- X (-— + (1 -7 (6 

wh r 7 is th mixing r m t r n is just onst nt, is th st n r 
vi tion of th rror on h oor in t o orr tly t rmin 7 n nt ils 
som knowl g of th outli r istri ution; hr it is ssum -without a priori 
knowl g -th t th outli r istri ution is uniform, with — + ing th ix 1 

r ng within whi h outli rs r x t to f 11 (for f tur m t hing this is i - 

t t y th siz of th s r h win ow for m t h s su lly th mo 1 s 1 tion 

m tho s r riv un r ussi n ssum tions, ut this ssum tion is not 
n ss ry, n th m tho s r qu lly v li using th ro ust fun tion for ro - 

ilitygiv n ov ( g s on h tti in 17 , or 41 hus in 11th t follows th 

ro ust lik lihoo is us th r th n minimiz (6 , it is oft n om ut tion lly 

mor sim 1 to minimiz ro ust fun tion 23 of th form 



( 



2 if 2 < 3 

3 if 3 



(7 



wh r is th num r of gr s of fr om in (2 for H, 1 for F , n 3=4 

h thr shol 3 = 4 orr s on s to th 95% onfi n 1 v 1 his m ns th t 

n inli r will only in orr tly r j t 5% of th tim 

his form of th fun tion hssvrl vntgs irstly it rovi s 1 r 

i hotomy tw n inli rs n outli rs on ly outli rs to giv n mo 1 r 

giv n fix ost, r fl ting th t th y ro ly ris from i us or uniform 
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istri ution, th log lik lihoo of whi h is onst nt, wh r s inli rs onform 
to ussi n mo 1 urth rmor if th outli rs follow sufR i ntly i us 
uniform istri ution th n th y will only in orr tly 11 gg s inli rs sm 11 
r nt g of th tim (f Is ositiv s 

h ro ust ost fun tion (7 Hows th minimiz tion to on u t on 11 

orr s on n s wh th r th y r outli rs or inli rs s of th ro ust fun tion 
limits th ts of outli rs on th minimiz tion y i lly, s th minimiz tion 

rogr ss s m ny outli rs r r sign t inli rs 

Problems with MLE: f th ty of r 1 tion — is unknown th n m ximum 
lik lihoo stim tion nnot us to i th form of — s th most g n r 1 

mo 1 will Iw ys most lik ly i h v low st— n 12 th vrg sum 
of squ r of this rror ( r shown for 100 s ts of 100 synth ti m t h s 

h mthswr gnrt to onsist nt with r n om F, Fa, H or Ha 
onstr ints, i with g n r 1 motion, orthogr hi roj tion, m r rot tion, 
or orthogr hi 1 n with ussi n nois =1 to th roj t m t h 

oor in t s h mo 1 is stim t for h t s t n th m n of th 
r or t n s n th t just i king th mo 1 with low st will in 
g n r 1 Iw ys 1 to hoosing th most g n r 1 mo 1 — F, thus th n for 

mor so histi t mo 1 s 1 tion m tho ish r w s w r of th limit tions 

of m ximum lik lihoo stim tion n mits th ossi ility of wi r form of 

in u tiv rgum nt th t woul t rmin th fun tion 1 form of th t ( ish r 

1936 250 14 ; ut th n go s on to st t ‘ t r s nt it is only im ort nt to 

m k 1 r th t no su h th ory h s n st lish ’ 



Estimated 




Point Motion 






General 


Orthographic Rotation 


Affinity 




F 


Fa 


H 


Fa 


Fundamental F 


93.074 (93) 87.037 


80.6162 


78.378 


Affine F a 


978.350 


96.448 (96) 


806.389 


85.875 


Homography H 


4986.881 


4834.735 


193.964 (192) 189.132 


Affinity F a 


4993.045 


4967.894 


1023.118 


191.643 (194) 



Table 2. Mean SSE for 100 matches over 100 trials. Varianee of noise on the 
coordinates: ^ = 1. Braeketted values are the expeeted value if the model is 

correct. 



4 Model Selection — Hypothesis Testing 

n w y of mo 1 om rison is vi hy oth sis t sting, this ro ur t sts 
th null hy oth sis th t on r 1 tion — i s ri s th t y om ring it to 

n It rn t hy oth sis th t th r 1 tion —2 sristh t hrl tions 

must nested so th t th r m t rs of th mor g n r 1 mo 1 1 in lu 11 

th r m t rs of th 1 ss g n r 1 mo 1 2 in ition to som xtr on s 
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ommon w y to o this is y th lik lihoo r tio t st ( g s 31, 35 o o 
this th L of th r m t rs i n 2 for oth r 1 tions must r ov r 
h n th t st st tisti 



( = 21 og ^^5 2 (8 

is X min , wh r — 1 h s mor r m t rs th n — 2 , whi h sym toti lly ^ 
follows ^ istri ution with pi — p 2 gr s of fr om, wh r pi is th tot 1 
num r of r m t rs in mo I i f ( is 1 ss th n som thr shol ( t rmin 
y 1 V 1 of signifi n a th n mo 1 2 is t oth rwis mo 1 2 is 

r j t , i th t st is 



( = 2( 1 - 2 < Pi -P2 (9 

f — 2 hoi s th n th h n of ov rfitting is th us r s ifi a f — 1 hoi s th n 

th h n of un rfitting is n unknown /3, n th qu ntity 1 — /3 r f rr to 

s th power of th t st h ow r of th t st is o viously r 1 t to th hoi 

of a n th istri ution of th t or inst n if 11 th m t h s h v sm 11 

is riti s th n th ow r of th t st for giv n a is lik ly to mu h low r th n 

if th is riti s r high r lly a shoul hos n so th t th h n of 
ov rfitting (a n un rfitting (/3 r sm 11 (i th ow r of th t st is high 
n th N ym n- rson th ory of st tisti 1 hy oth sis t sting only th ro - 

iliti s of r j ting n ting th orr t n in orr t hy oth s s, r s 

tiv ly, r onsi r to fin th ost of ision h ro 1 m with this 

ro h is th t it is iffi ult to t to situ tion wh r s v r 1 mo Is might 
ro ri t , s th t st ro ur for multi 1 - ision ro 1 m involv s 
ifh ult hoi of num r of n nt signifi n 1 v Is his sugg sts 

i r nt ro h in whi h s oring m h nism is us to r nk h mo 1 s 

s n, m ximum lik lihoo m tho s will Iw ys 1 to th most g n r 1 mo 1 

ing sit ; h n th n for mor g n r 1 m tho of in u tiv inf r n 

th t t k s into ount th om 1 xity of th mo 1 his h s 1 to th v 1- 

o m nt of V rious information criteria (s th s i 1 issu of sy horn trik on 

inform tion rit ri , ol 52, No 3 or most mongst th s is ‘ n inform tion 

rit rion’ (AIC) ( k ik 1974 , s ri n xt 

5 AIC for Model Selection 

k ik ’s inform tion rit rion is us ful st tisti for mo 1 i ntifi tion n 
V lu tion k ik (1974 w s rh s th first to 1 y th foun tions of inform - 
tion th or ti mo 1 v lu tion v lo mo 1 s 1 tion ro ur -for 

us in uto-r gr ssiv mo ling of tim s ri s-th t hos th mo 1 with mini- 
mum X t r i tion rror for futur o s rv tions s th st fitting h 

meaning as the number of observations tends to infinity 
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ro ur s 1 ts th mo 1 th t minimiz s x t rror of n w o s rv tions 

with th s m istri ution s th on s us for fitting ^ t h s th form 

AIC = -2 +2p (10 

wh r p is th num r of r m t rs in th hos n mo 1, n is th log 

lik lihoo t n s n th t AIC h s two t rms, th first orr s on ing to th 

n ss of fit, til s on n Ity on th om 1 xity of th mo 1 h n th r 
r s V r 1 om ting mo Is, th r m t rs within th mo Is r stim t 

y m ximum lik lihoo n th AIC s or s om r to fin th mo 1 with th 

minimum v lu of AIC his ro ur is 11 th minimum AIC ro ur , 

n th mo 1 with th minimum AIC is 11 th minimum AIC stim t 

(MAICE whi h is hos n s th st mo 1 h r for th st mo 1 is th 

on with high st inform tion ont nt ut 1 st om 1 xity n v nt g of th 
AIC is its sim li ity, s it os not r quir r f r n to look u t 1 s, it is 

V ry sy to 1 ul t AIC on th m ximum lik lihoo stim t of th mo 1 

r m t rs is m urth rmor , k ik 1 ims th t th r is no ro 1 m of 

s ifying n r itr ry signifi n 1 v 1 t whi h mo Is shoul t 1 , 

n om rison tw n two mo Is n not n st or or r 

h AIC is V lo from th i th t th st mo 1 is th t whi h mini- 
miz s th X t SSE for futur t onsi r th s of fitting v ri ty of 

im nsion to im nsion 1 oints (r 11 our finition of r 1 tions in t rms of 

V ri ti s in s tion 2 , in this s th o im nsion is — K n t ni 26 oints 

out th t th AIC in this s is 

= -2 + 2 ( n + fc (11 

K n t ni’s riv tion of th AIC for 1 st squ r s is r th r r wn out, n th 
int r st r r is r f rr to his ook 26 nft kik 3gv simil r 

form for th AIC in th s of f tor n lysis wh n fitting mo Is of i ring 
im nsions th r th n r s nt it h r n intuitiv int r r t tion is r s nt 

in th n xt s tion; in th two im nsion 1 s =2, fitting lin mo 1 = 1 
n oint mo 1 =0 

Intuitive Interpretation onsi r qu tion (11 , th first t rm is th usu 1 
sum of squ r s of r si u Is, ivi y th ir v ri n s, r r s ntingth goo n ss 

of fit h n xt two t rms r r s nt th rsimony of th mo 1 h s on 

ing n Ity t rm for th im nsion lity of th mo 1, th gr t r th im nsion 

for th mo 1 th gr t r th n Ity h 1 st t rm is th usu 1 AIC rit rion 
of ing th num r of r m t rs of th mo 1, to n liz mo Is with mor 
r m t rs 

his is now illustr t y sim 1 xml, onsi r th two im nsion 1 

xml shown in figur 1 u os oints r g n r t from fix lo tion 

® He later demonstrated that AIC was an estimate of the expected entropy (Kullback- 
Leibler information) of the fitted distribution for the observed sample against their 
true one, showing that the model with the minimum AIC score also minimized the 
expected entropy, thus providing one way of generalizing MLE. 
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Fig. 1. Showing the relationship between the noisy points, the optimally estimated 
line and the eentroid of the noisy points. For Gaussian noise the optimally esti- 
mated line is that whieh minimizes the sum of squares of perpendicular distances, 
and consequently it passes through the centroid of the data. For each point the 
distance to the centroid may be broken up into two components one parallel and 
one perpendicular to the line. 



with m n z ro, unit st n r vi tion, ussi n nois in oth th 

n oor in t s f oint n lin r fitt s r t ly y minimizing th 
sum of squ r u li n ist n s, th o tim lly fitt oint (th ntroi 
will li on th o tim lly fitt lin 40 L t th sum of squ r ist n s of 
th oints to th lin mo 1 ? n th sum of squ r ist n s of th 

oints to th oint mo 1 I] p, th n ^ p = ^ ^ + ^ |, wh r X] | is 

th ‘ r 11 r sum of squ r ist n s s shown for on oint in igur 1 t n 

s n th t uni ss th t 11 li x tly on oint th n X] f is Iw ys 1 ss 

th n ^ p h AIC for th lin mo 1 om ns t s for this i s y th n Ity 

t rm, whi h is twi th x t tion of th ‘ r 11 1’ sum of squ r s (X( | f 

th mo 1 stim t is lin th n th AIC h s th form 

AIC(lin =( ^ + 2n + 4 (12 

s th mo 1 h s im nsion on , o im nsion on n two gr s of fr om 

in th r m t rs f th num r of th t is 1 rg th gr of fr om of 

th mo 1 (i , th num r of th r m t rs h s littl t us it is 

sim 1 oust nt h t m tt rs is twi th im nsion of th mo 1, whi h is 

multi li y th num r of th t h im nsion qu Is th ‘int rn 1’ gr 

of fr om of th t , whi h in turn qu Is th x t tion of th ‘ r 11 1’ (or 

in ir tion on th m nifol sum of squ r s r turn turning to th 
X m 1 th GIC for oint is 

AIC(oint=( f+( ^ + 4 (13 

thus oint is f vour if AIC( oint — AIC (lin i X] I ~ n th 

Igorithm is quiv 1 nt to t st of s r long th lin 

Test results using AIC. onsi r th v r g SSE giv n in t 1 2 for 100 
t oints hs n turn into AIC y th ition of 614, 608, 416, n 
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412 — 2(n +k — for F, Fa, H n Ha r s tiv ly, t ul t in 1 5 t n 
s n th t on v r g th low st AIC qu t s to th orr t mo 1, Ithough 
it h V s 1 ss w 11 for istinguishing F from Fa, th n F from H n r lly 
th AIC t n s to un r stim t th im nsion of th t n ov r stim t 
th num r of motion mo 1 r m t rs hr son for this n s n in th 
ont xt of th lik lihoo t st x 1 in in th 1 st s tion onsi r using AIC 
to om r two mo Is — i n — 2 su h th t th mo 1 with low st AIC is 
t i mo 1 2 is t if AIC 2 — AICi < 0 or if 

2( 1 - 2 <2{pi-p2 (14 

whi h n s n to ir tly quiv 1 nt to (9 ollowing this lin of thought 
th signifi n 1 v 1 of th AIC rit rion is giv n y th signifi n 1 v 1 of th 
^ istri ution with — P 2 — gr soffr om n riti Iv lu 2 ^i—p 2 — or 



-P 1 -P 2 -- 1 2 3 4 5 6 8 10 12 14 20 

a 0.156 0.135 0.111 0.091 0.074 0.061 0.042 0.029 0.020 0.014 0.005 



Table 3. Calculates values for significance level a given a ^ with — p 2 ~ 
degrees of freedom and critical value/threshold 2^i — p 2 ~ 



Ti“T 2 — =2,th i r n in th num r of r m t rs tw n H n Ha, this 

I s to a = 0 135 or 13 5 r nt h n of ov rlitting, for (pi — p 2 =3, th 

i r n in num r of r m t rs tw n F n Fa, this 1 s to a = 0 11 or 

II r nt h n of ov rfitting, som ty i lly v lu s r giv n in t 1 3 t n 

s n th t this orn out x rim nt lly s rutinizing 15 s {pi — P 2 

in r s s th V lu of Q om s sm 11 r (1 ss th n 0 005 for (pi — p 2 =20 



Model Selected 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H F 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


707.074 701.037 694.6162 692.378 

1586.350 704.448 1414.389 683.875 

5402.881 5240.735 609.964 605.132 

5405.045 5379.894 1435.118 603.643 



Table 4. Mean AIC for 100 matches over 100 trials. It can be seen that the 
chance of overfitting dimension is small relative to the chance of overfitting the 
degree; i.e. average AIC for F lower than for Fa. 



11 th t th num r of r m t rs within h mo 1 h s two om on nts 

p = k + n , th first k is th num r of r m t rs in th r 1 tion, th s on n 
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Estimated 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H Fa 


Fundamental F 
AfHne F a 

Homography H 
Affinity F a 


99 11 00 

1 88 0 0 

00 98 15 

0 0 2 85 



Table 5. Number of times each model selected over 100 trials, using AIC for 
each of the four motion types. It can be seen that AIC tends to overfit the degree 
of the model. 



is num r of r m t rs ro ortion 1 to th qu ntity of t his sugg sts 

th t mor g n r 1 form, th g om tri ro ust inform tion rit ri GRIC 

GRIG = —2 + 1 n + 2 k (15 

might ro ri t , with in 2 hos n to r u misfits ssu s in 

t rmining v lu s for th s r m t rs, n sugg st v In s for th m r now 
is uss igh r V lu s of 1 n 2 r s a ut in r s /?, r sing th 

power of th t st; thus th two r m t rs shoul hos n with n y to min- 

imizing a n P or th stim tion of two vi w g om try from f tur m t h s 
1=2 n 2 = 4hv rovi goo r suits, ov r wi r ng of on itions 

h first r m t r 1 influ n s th ision s to wh th r th th r 1 tion 

shoul im nsion 3 or 2, for n — 20, o; < 0 005 whi h m y onsi r 

t ly low hus th r is littl h n of ov r fitting th im nsion s this 

s num r of r m t rs qu 1 to th num rofm t h s, usu lly suffi i ntly 

high to gu r nt low a Now onsi r om osing g n r 1 motion into 1 n 

lus r 11 X motions 24 , th siz of P n s on th mount of r 11 x 
tting 1 = 2 m ns th t th mount of r 11 x n s to on v r g 
gr t r th n 2 0 ix Is to i ntify non homogr hy r 1 tion 

11 th t th r is high ro ility of ov rfitting th gr of th r 1 tion 

for two mo Is of th s m im nsion orths on rmtr 2 = 40 nsur s 

th t for (pi — p 2 = 1 o = 0 0456 whi h r v nts th t n n y to ov rfit th 

gr of th r 1 tion whilst not signifi ntly ting th ow r of th t st 

(whi h is omin t y th hoi of 1 for Irg tsts h vrg GRIC 

is giv n in 1 6 n th mo Is s 1 t in 1 7, it n s n th t th 
GRIC out rforms th st n r AIC 

AIC Variants: h f t th t th AIC t n s to ov rfit is g n r lly r - 

ogniz in th lit r tur oz og n 7 tt m ts to riv m sur s th t r 

sym toti lly onsist nt 

CAIC = -2 +|(log( +1 

CAICF = -2 +|(log( +2 +logJ- 
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Model Selected 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H F 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


721.074 715.037 708.6162 706.378 

1592.350 712.448 1422.389 691.875 

5410.881 5248.735 617.964 613.132 

5411.045 5385.894 1441.118 609.643 



Table 6. Mean GRIG for 100 matches over 100 trials. 



Estimated 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H Fa 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


99 1 0 0 

1 98 0 0 

0 0 98 3 

0 1 2 97 



Table 7. Number of times each model selected over 100 trials, using GRIG for 
each of the four motion types. 



wh r J is th inform tion m trix of th stim t r m t rs nfortun t ly 
oth of th s m sur st n to hroni lly un rfit, th yh v n x t simil rity 
to th BIG roxim tion to th y s f tors is uss in th n xt s tion 



6 Bayes Factors 



ithin this s tion th y si n ro h to mo 1 om rison is intro u 

u os th t th s t of m t h s is to us to t rmin tw n om- 
ting motion mo Is with r 1 tions— i — with r m t rv tors i k 

h n y y s’ th or m, th ost rior ro ility th t — fc isth orr tr 1 tion 



is 



r(-fc- 



1 



— k r(-k 
r( — i r(-i 



(16 



not th t y onstru tion r(— i— =1 11th ro iliti s r im li itly 

on ition 1 on th s t of r 1 tions — i — k~ ing onsi r n th s of 

im g m t hing th mo Is F, Fa, H, Ha shoul suffi to om 1 t ly s ri 

most situ tion h m rgin 1 ro ility r( — k is o t in y int gr ting 

out fe, 



r( 



k = 



r( 



r( 



(17 



= ( 



lik lihoo 



nor k 



(18 
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h xt nt to whi h th 
terior odds, 



ij — 



t su 

r(-»- 

r(-i- 



orts — i ov r —j is m 

r( r(-i 



sur 



y th pos- 



(19 



■p^T — is 11 y s tor, th t rm origin t y oo , th m tho 

ttri ut y 00 to uring n r ys 25 t is simil r to th lik lihoo r tio 
for mo 1 om rison ut involv s int gr tion of th ro ility istri utions 

r th r th n om ring th ir m xim h first t rm on th right h n si of 

(19 is th r tio of two int gr Is giv n in (18 h s on f tor is th rior 

o s, whi hishr sttol,r rs nting th s n of ny rior r f r n 

tw n th two r 1 tions, i r(~i = hus (19 n r writt n 



ost rior o s = y s tor — rior o s 



(20 



6.1 Calculating Bayes Factors 



nor r to om ut th y sf tor, th rior istri utions r( ^ — k of h 
mo 1 must s ifi his is oth goo n , goo s it Hows th 
in or or tion of rior inform tion (su h s th stim t of th r 1 tion from 
r vious fr m s , us th s rior 

th r is no su h inform tion 

h si st ro h is to us th BIC 
r( k is roxim t ly norm 1 with m n 
riv tion is giv n in n ix 11 h BIC 
is 

BICfc = 



nsiti s r hr to o t in wh n 

roxim tion whi h ssum s th t 
fc n ov ri n m trix H; th 
roxim tion for th /cth mo 1 



I 

k + pog 



(21 



th mo 1 with low st BIC ing most lik ly ssuming th t th rior on th 
mo Is r(— fc is uniform th ro ility of h mo 1 m y 1 ul t s 



r(- 



X (BICi 



Etf X (BICi 



(22 



iv n two vi ws, sim 1 r mo Is will f vour ov r mor om 1 x on s s th 
I log t rm will omin t , ut s th num r of im g s in r s s th lik lihoo 
fun tion log( — k will t k r n 

Test results using BIC. h BIC is ttr tiv us of its sim 1 form 
whi h m k s it sy to om ut nfortun t ly th BIC roxim tion on- 
sist ntly un rfits th mo 1 f vouring mo Is of too low im nsion his is 

u to th oor roxim tion to qu tion (29 onsi r 100 m t h s th n 

= 400, th num r of o s rv tion, n th num r of r m t rs to sti- 

m t is p = 307 for th fun m nt 1 m trix n 208 for homogr hy hi h 

1 s to BIC n Ity t rm of 307 In 400 for F n 2081n400 forH i r n 
of 593 15, g n r lly homogr hy woul h v to n x tion lly oor fit 
tw n two vi ws for its BIC woul th t mu h 1 rg r th n for fun- 
m nt 1 m trix his woul only o ur with v ry 1 rg s lin n v ry 
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1 rg rs tiv ts h ro 1 m om s y th roxim t ion of 5— y 
— I logn, s n ix 11 whi h is g n r lly roxim tion if < 5 p 27 

ut Iso roxim tion for th ssi n of th nuis n r m t rs 

Other Ways to Approximate Bayes Factors h rux of th ro 1 m 
with y s f tors is th hoi of rior i{ k — k s this m y influ n th 

r suit h BIG roxim tion fin ss s this t il y ssuming th rior is v ry 

i us , wh r s i lly th y s f tors shoul v lu t ov r r ng of 
riors to h k th ir st ility itkin 1 sugg sts using th ost rior PDF of 

to om ut wh t h 11s ost rior y s f tors ow v r s k ik oints 

out “th r t us of on n th s m s m 1 in th ost rior m n of 
th lik lihoo fun tion for th flnition n v lu tion of ost rior nsity 

rt inly intro u s rti ul r ty of i s th t inv li t s th us of th m n 

s th lik lihoo of th mo 1” noth r v ri tion is to ivi th t into two, 
using on rt to stim t rior n noth r rform th mo 1 s 1 tion n 

t ont min t with outli rs this ro ur is fr ught with ril, g 11 th 

outli rs li in on of th s ts n n us uniform or i us riors ut gr t r 
must t k n wh n oing this or Lin 1 y’s r ox m y o ur 10 in whi h on 

mo 1 is r itr rily f vour ov r noth r h ro 1 m is th t fl t riors r 

s ifi only u to n un fin multi li tiv oust nt 



6.2 Modified BIG for Least Squares Problems 

t is r nt th t th n Ity t rm for th num r of nuis n r m t rs u 
to th im nsion of th r 1 tion, n th num r of r m t rs u to gr of 

th r 1 tion shoul w ight i r ntly n th n ix it is x 1 in how 

th BIG is o t in y roxim ting th t rmin nt of th log ssi n — 

th inform tion m trix or inv rs ov ri n m trix — y |logn houl 
th stim tion of th r m t rs influ n y 11 th t th n this is 
r son 1 first or r roxim tion ow v r for th 1 st squ r s ro 1 m 

this is not th s h stim tion of ho tim 1 m t h m^ (th stim tion 

of th nuis n or int rn 1 r m t rs r m t rs with gr s of fr om, 
wh r is th im nsion of th two vi w r 1 tion is only t y th 4 
noisy oor in t s of th m t h m un r th ssum tion th t th m t h s r 

in n nt his is h r t riz y lo k i gon 1 ov ri n m trix mong 

m t h s his sugg sts th g om tri y si n inform tion rit rion GBIG 

GBIG = — 2 + log (4 n + log( k (23 

might ro ri t , with = 4 n (r 11 n is th num r of m t h s his 

giv s V ry simil r rform n to GRIG 

7 The Quest for the Universal Prior: MDL 

ithin this s tion th minimum s ri tion 1 ngth rin i 1 is outlin , s 

u on th i of rsimony h t th mo 1 th t r quir si st o ing is st. 
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this rin i 1 h s n on of th m in rul s of s i n sin its in tion ” 
r to mit no mor us s of n tur 1 things th n su h s r oth tru n 
snffi i nt to X 1 in th ir r n s o this ur os th hiloso h rs s y th t 

N tur o s nothing in v in, n mor is in v in wh n 1 ss will s rv ; for N tur 
is 1 s with sim li ity; n ts not th om of su rfluous us s ” 

(N wton Principia 1726, vol , 398 h k y intuitiv i is th t th sim 1 st 

s ri tion of th ro ss will sym toti lly ( s th num r of o s rv tions 

of th t ro ss om s inlinit fun tion lly quiv 1 nt to th tru on 

iss n n 42 (1978 v lo rit rion with simil r form to th BIC 

from tot lly i r nt st n oint riv th minimum- it r r s nt tion 

of th t , t rm SSD — short st s ri tion 1 ngth, n MDL — minimum 

s ri tion 1 ngth — n ro h sugg st y th i of Igorithmi om 1 xity 

( olomono 48 n Kolmogorov 29 11 n oulton 58 v lo 

V ry simil ri to MDL 11 th minimum m ss g 1 ngth (MML) ro h 
h rit ri MDL n MML r s on minimum o 1 ngths — giv n th 

t r r s nt u to finit r ision, on i ks th r m t rs so th t th 

mo 1 th y fin rmits th short st ossi 1 o 1 ngth hoi ngth 

ing th sum of th o 1 ngth for th mo 1, n th o 1 ngth for th t 

giv n th mo 1 i th rror (th two r ir tly n logons to th log rior 

n log lik lihoo iv n th t it t k s roxim t ly log 2 its to n o 
num r th n it om si r th t th most fr qu ntly o urring o s rv tions 
shoul giv n sm 11 st o 1 ngths, h n MDL m tho s r int gr lly link 
with y si n m tho s y si n tt m ts t mo 1 s 1 tion n stymi 

y th n for rior istri utions, whi h r th y si n’s hiloso h r’s ston ; 

MDL rs to romis su h univ rs 1 rior for th om 1 xity/ rsimony 

of th mo 1 ow V r th t rm riv 

MDL = -2 - I log (24 

h s th s m form n li i n i s s th BIC t this is only first or r 
roxim tion to th o tim 1 o 1 ngth 11 r m n 59 (1987 furth r 

V lo / X n MML 1 ing to v ry simil r rit rion to oz og n’s CAICF 
Test results using MDL: n r lly MDL rit ri r th s m s y si n 

n ro u simil r r suits 

8 Bayesian Model Selection and Model Averaging 

rit rion not onsi r for mo 1 s 1 tion hr is n u tiv r soning t s 

k t 1 st to th r k hiloso h r i urus (3427-270? , who ro os 

th following ro h “Principle of Multiple Explanations f mor th n 
on th ory is onsist nt with th o s rv tions, k 11 th ori s” 32 hus s 
oint out y his follow r Lu r tius (95-55 ifthr r svrl xln- 

tions s to why m n i , ut on os not finitiv ly know whi h on is tru , 
th n Lu r tius ( n 1 t r olomono 32 vo t s m int ining th m 11 for 
the purpose of predietion (som wh t th o osit of kh m’s oint of vi w 
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ofr hv sri m tho s in whi h 11 th ot nti 1 mo Is r fitt 
n mo 1 s 1 tion ro ur is us to s 1 t whi h is st n it is o t 
r th r th n from oth r f nsi 1 mo Is his is som wh t r itr ry (“ qui t 

s n 1” 8 , first mitting th t th r is mo 1 un rt inty y s r hing for 

“ st” mo 1 n th n ignoring this un rt inty y m king inf r n s n 
for sts s if it w r rt in th t his hos n mo 1 is tu lly orr t 1 tion 

of just on mo 1 m ns th t th un rt inty on som r m t rs is un r sti- 

m t 21, 36 s will now monstr t jorth istinguish s tw n global 

r m t rs whi hr fin for 11 mo Is n local r m t rs whi h r not 
onsi r figur 2 i ting two im nsion 1 sli of r m t r s for two 
glo 1 rmtrs sth2 rmtrsr Itr (fixing for mom nt 11 

oth r r m t rs th mo 1 s 1 tion rit rion ( g AIC will 1 to i r nt 

mo Is ing sit , giv n fix t his will not (signifi ntly t th 

un rt inty of r m t r stim t t oint in figur 2 s it is w y from 
oun ry; ut oint will h v its un rt inty in orr tly stim t uni ss 

th f t th t multi 1 mo Is r ing onsi r is t k n into ount, s th 

mo 1 r m t riz tion will h ng ov r th oun ry tw n mo 1 1 n 

mo 1 2 urth rmor s th r is no h n for mo 1 1 to t k v lu s of th 

r m t rs within th r gions th t AIC llo t s to mo 1 2 or 3 for this t 

s t; even if that is the correct answer 

jorth 21 giv s r th r n gl t i s th or m whi h is intuitiv ly o vious 
th t th X t V lu of th MAIC will 1 ss th n th minimum of th 
X t V lu of th AIC for 11 th mo Is un r onsi r tion hus if mo 1 

z w r th tru mo 1 th ov r 11 x t tion of th MAIC woul 1 ss th n 
AlCi (so sion lly th wrong mo 1 is s 1 t us is h s low r AIC 

hi h in turn m ns th t th r si u Is n ov ri n r slightly low r th n 

woul X t 




Fig. 2. Two cases, where the uncertainty in the model is unimportant, 
where the uneertainty in the model becomes important 



y si n mo 1 v r ging t k s into ount x li itly mo 1 un rt inty y 
r r s nting th t — hstofmths y om in tion of mo Is 11, 22, 
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27, 30, 37 th r th n using on motion mo 1 st n r y si n form lism 
is o t whi h V r g s th ost rior istri utions of th r i tion un r 

h mo 1, w ight y th ir ost rior mo 1 ro iliti s 

n of th m in v nt g s of mo 1 v r ging om s in r it ion if n w 
m t h is o s rv th n its lik lihoo is om ut s w ight v r g ov r 11 

th mo Is n it h s n shown th t mo 1 v r ging n ro u su rior 

r suits in r i tiv rform n th n ommitm nt to singl mo 1 34 u- 

os th t th r r om ting motion mo Is with r 1 tions — i — k with 

r m t r V tors i k, th t oul s ri j h n y si n inf r n 
out mi is s its ost rior istri ution, whi h is 

i=K 

r(mi- = ( r(mi^k r(-k- (25 

i=l 

y th 1 w of tot 1 ro ility 30 hus th full ost rior ro ility of mi 
is w ight V r g of its ost rior istri ution un r h of th mo Is, 
wh r th w ights r th ost rior mo 1 ro iliti s, r(— fe— riv in 

(22 qu tion (25 rovi s inf r n out mi th t t k s into full ount th 

un rt inty tw n mo Is, n is rti ul rly us ful wh n two of th mo Is 

s ri th t qu lly w 11 

Test results using model averaging, s th r is no mo 1 s 1 tion ro- 
ur slightly i r nt t st rit rion h to us hr suits for mo 1 

V r gingw r ss ss y x mining how ointsw r orr tly 1 ssifi s inly- 

ing or outlying, using th om in tion of mo Is for th 1 ssifi tion ( Is wh r 
mo 1 V r ging is ss ss for s gm nt tion 55 v r 11 th r suits w r is- 

ointing with th r ing v ry littl i r n tw n th mo 1 v r ging 

ro h n using th mo 1 sugg st y mo 1 s 1 tion ro ur us to 
g n r t th ost rior istri ution ov r th mo Is (AIC n BIC w r tri 

9 Results 

11 of th mo 1 s 1 tion ro ur s h v n im 1 m nt n om r on 

1 rg t st of synth ti n r 1 im g irs 

Synthetic Data, hr synth ti t s s of on hun r s ts of 100 synth ti 
m t h s, with 10 — 30% outli rs , r g n r t to onsist nt with ith r 
r n om F, Fa, H or Ha h of th mo 1 s 1 tion Igorithms w s run on 
h s t n th mo 1 hos n om r with th known groun truth 

Real Data. h Igorithms h v n t st on m ny im g s, h r two ty i 1 
im g irs r shown n 11 th x m 1 s t st for this r, th orn rs 

r o t in y using th t tor s ri in 18 , th m t hing ro ur 
us s ross orr 1 tion in squ r s r h win ow h st n r vi tion of th 
rror of th oint orr s on n s w s stim t ro ustly 57 or mo 1 
om rison t sting is som wh t si r th n in th g n r 1 s of stim tion, s 
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Fig. 3. Left images, Indoor sequenee, eamera translating and rotating to fixate on 
the house; Right Images, two views of a buggy rotating on a table. With disparity 
vectors for features superimposed. 



11 th t n s to known of th groun truth r som qu lit tiv s ts in 

or r to t rmin whi h mo 1 is th tru on ; i if th motion is g n r 1 n 

th r r is rni 1 rs tiv ts th n F is th tru mo 1, if th m r 
rot t s out its o ti ntr th n H is th tru mo 1, if th s n is ist nt 

so s to nr orthogr hi th n Fa is th tru mo 1, n if 11 th oints 

li on ist nt 1 n th n Ha is th tru mo 1, Iso if th fo 11 ngth is long 
n th m r rot t s Ha is ro ri t wo ty i 1 im g irs from th 

t s r th uggy n o 1 hous 

Buggy. h 1 ft two im g s igur s 3 two vi ws of uggy rot ting on turn 
t 1 , goo orthogr hi ut not rs tiv stru tur n g n r t for this 

s n n th orr t (or groun truth mo 1 shoul Fa 

Model house data, h right two im g s igur 3 show s n in whi h m r 

rot t s n tr nsl t s whilst fix ting on mo 1 hous h s n is g n r 1 

motion, this is us th tr nsl tion 1 n rot tion 1 om on nts of th m r 

motion w r oth signifi nt, n full rs tiv stru tur n r ov r 

Summary of Results 

1 Lik lihoo r tio t st, only r lly ro ri t for om rison of two n st 
mo Is 
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2 AIC t n s to orr tly r v 1 th im nsion ut ov rfits th gr of th 
r 1 tion 

3 BIC n MDL t n to gr tly un rfit th im nsion n gr 

4 h g om tri ro ust inform tion rit rion GRIG (15 with i = 2 0, 2 = 

40 n 3 = 40 rou th most onsist ntly goo r suits, so f r 

5 o 1 V r ging ro u littl im rov m nt tog th r with in r sing th 

mount of om ut tion n ss ry to stim t lik lihoo s 

suits of using th ro ust mo 1 s 1 tor GRIG on th r 1 im g s r giv n 
in 1 8 



Estimated n 


Motion of Points 

General Orthographic Homography Affinity 


Model House 80 
Buggy 167 


596 618 652 755 

1221 1190 1240 1450 



Table 8. GRIG values for the images. The model with lowest GRIG is under- 
lined. 



10 Discussion 

noth r o ul r 1 ss of m tho s for mo 1 s 1 tion in lu ross v li tion, 
j kknifing, ootstr , n t s litting 12, 21 in whi h th t is ivi 

into two rts on is us to lit th r 1 tion n th oth r is us to v lu t th 

goo n ss of lit ow v r su h ro ur s r v ry om ut t ion lly int nsiv n 

highly s nsitiv to outli rs, n r suits on r 1 im g s r oor nth sym toti 

s ton 49 monstr t s th t AIC n ross v li tion r quiv 1 nt h 

r suits r som wh t tt r in th outli r fr s , ut r mov 1 of th outli rs 

r su os s knowl g of th orr t r 1 tion for th t 

t h s n o s rv th t th onstru tion of rior istri ution for h 

r 1 tion is th rux of t rmining th y s f tors, n y t w r not without 

rior inform tion out th istri utions of su h things s F t is ossi 1 to 
onstru t this rior y intro u ing our su j tiv rior knowl g (lik tru 
y si n of m r li r tion n th r ng of ossi 1 m r motions; g 

it might known th t th rin i 1 oint is roughly in th ntr of th im g , 

th t th s t r tio is roughly unity n th fo 11 ngth li s within rt in 
r ng it might Iso known thtth mrws hnhl mrs moving 

roughly w Iking y ssigning ussi ns ro ility istri utions to th s 

with m ningful r m t rs mont rlo s m ling m tho s 15 n us to 

g n r t th rior for h— , nhn th ysf tor t is th n ossi 1 to 

us (29 ir tly to 1 ul t th ysf tor K ss n ft ry 27 r vi w 

1 rg num r of su h ont rlo styl t hniqu s for stim ting ysf tors 

noth r int r sting qu stion is wh th r it is n ss ry to 1 ul t 11 th 

mo Is rior to mo Is 1 tion? 11th ro h s vo t involv stim tion 
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of 11 ut tiv mo Is will h is ostly in om ut tion tim h v x rim nt 
with th us of ov ri n m tri s (non-ro ust n ro ust 43 in th ho th t 
y fitting th most g n r 1 mo 1th ov ri n m trix might r v 1 if th r r 

g n r is his works r son ly w 11 if th r r f w outli rs in th t ut 

oorly oth rwis h It rn tiv ro h is to lit th 1 st g n r 1 mo 1 n 

ss ss its goo n ss of fit, only rogr ssing to mor g n r 1 mo 1 if th fit is 

ow V r g in this is ro 1 m ti in th s wh r th r r outli rs in 
th t it is h r to istinguish s t of outli rs to low or r mo 1 from 
s t of inli rs to th high or r mo 1 without fitting oth mo Is o quot from 
illi m 1 k “ h ro of x ss 1 s to th 1 of wis om; ou n v r 

know wh t is nough uni ss you know wh t is mor th n nough” {Proverbs of 

Hell 

Now w rning ft r mo 1 s 1 tion stim t s of mo 1 r m t rs n of 

th r si u Iv ri n s r lik lyto is nr lly mo Is 1 tion i s s r 

hr to qu ntify, ut r h r t riz y fl tion of ov ri n stim t s h 

ty of i s will V ry mu h n on th Igorithm us to rform th mo 1 

s 1 tion ( s w 11 s th rit rion of mo 1 s 1 tion o quot rl 39 “ t 

woul , th r for , mor ro ri t to onn t r i ility with th n tur of 

th s 1 tion ro ur r th r th n th ro rti s of th fin 1 ro u t h n th 

form r is not x li itly known sim li ity m r ly s rv s s rough in i tor for 

th ty of ro ssing th t took 1 rior to th is ov ry” h qu ntifi tion 

of th s i s s is n on going to i for r s r h 

11 Conclusion 

n r lly th r r two ro 1 m r s in om ut r vision; th first is of fin - 

ing th orr t r r s nt tion of th t , n th s on is m ni ul ting th t 

r r s nt tion to m k isions n form hy oth s s out th worl o 1 

s 1 tion li s within oth r s n it is riti 1 to th sign of om ut r vision 

Igorithms, n y t oft n n gl t his n 1 to i s in stim tion or 
inst n th r r 1 rg num r of rs out th un rt inty of th fun - 
m nt 1 m trix ( g 9, 54, 60 ut th t ignor th f t th t th r is un rt inty 

in th hoi of mo 1 its If 

V r 1 m tho s for mo 1 s 1 tion in th 1 st squ r s r gr ssion ro 1 m 
hv nrviw niths n shown th t ( r must t k n to ount 
th gr s of fr om in th mo 1 ( th t r ful istin tion must m 
tw n int rn 1 (nuis n n xt rn 1 r m t rs 

in lly, it n s nthtthr rsvrli rnt mo 1 s 1 tion m tho - 
ologi s is uss in this r t is r nt th t Ithough th mo 1 s 1 tion 
r igms ros in i r nt r s th y h v gr t simil riti s s if th y r 
sh ows of som gr t r th ory to om , th is ov ry of this th ory is lin 

for futur thought nwhil only y in or or ting n un rst n ing mo 1 
s 1 tion Igorithms (whi h r m r ly m h nisms for m king inf r n into 
om ut r vision Igorithms n rogr ss m to fully utom t syst ms 
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Appendix: The BIC Approximation 

h si st ro h to 1 ul ting y s tors is to us th BIC roxim - 

tion in whi h it is ssum th t r( k — fc is roxim t ly norm 1 with m n 
k n ov ri n m trix h m n n ov ri n n stim t s 

follows, 1 t 

log( r( = 4>{ k =log( r( ^k k r( k^k (26 

th n fc is th stim t of fe, su h th t (/>( fe is minimiz t , with th 

ssi n qu Ito n ov ri n stim t s inv rs ssi n, rforming 

ylor X nsion roun giv s 

r( ^k = ^ X k k 

~ X (f){^ k ^ X ^ k 

= X -</>( fc (2^ P/2- A/2 

h 1 st st in th qu tion ov is st n r r suit for th int gr tion of 

multi-v ri t us si ns hus 

log( r( — k « L + ^ log 27 t + ^ log log( r( k — k (27 

whi h is L 1 ’s roxim tion 6, 33 , from this v rious roxim tions n 
m 1 ing to 1 ul 1 mo 1 s 1 tion rit ri rh s th sim 1 st is 
giv n y hw rz who roxim t s th rior y norm 1 istri ution with 
™ II prior II ov ri n 1 ing to 

r( fe k = (27T p/ — 2 — x( — ( pj.jQj. — k priori prior “ k (28 

thus 

log( r(— fc— Ri ~ ij( fc ~ prior priori prior + IT 

z e z prior 

Now = prior L, wh r ^ is th inv rs of th ssi n of th log-lik lihoo 
t fc, 1 ing to 

log( r(— fc— RsL — — ( k~ prior ^^prior^ ~ prior “ 2 ~ 
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f th rior is ssum to v ry i us th n hw rz 46 sugg sts is ounting 
th s on t rm n roxim ting th ssi n y |log— | log n tog t 

log(-fe- «log( -k - |logN = L - ^logN (30 

wh r p = n+k is th tot 1 num r of r m t rs in th syst m, n = 4n is 
th tot 1 num r of o s rv tions, n th num r of m t h s, n th im nsion 
of wi th n g tiv of this, — +plog , is r f rr to s th y si n 
nform tion rit rion or BIG n n us to om r th lik lihoo of 

om ting mo Is 

n r 1 omm nts h BIG iv rg s from th full y si n vi w oint y 

is ounting th rior t rm in (30 h t rmin ntofth log ssi n ^log — 

no s th un rt inty in th r m t r stim t s (th sm 11 r this v lu th 
gr t r th r ision in th stim t f v il 1 th ssi n its If shoul 
1 ul t n us 

References 

[1] M.A. Aitkin. Posterior Bayes Factors. t tist o , 53(1):111-142, 1991. 

[2] H. Akaike. A new look at the statistical model identification. r ns on 

uto ti ontro , Vol. AC-19(6):716-723, 1974. 

[3] H. Akaike. Factor analysis and AIC. s o trik , 52(3):317-332, 1987. 

[4] P. Beardsley, P. H. S. Torr, and A. Zisserman. 3D model aquisition from ex- 
tended image sequences. In B. Buxton and Cipolla R., editors, ro t uro 

p n on r n on o put r ision LN 6 ri , pages 683-695. 

Springer- Verlag, 1996. 

[5] P. Beardsley, P. H. S. Torr, and A. Zisserman. 3D model aquisition from ex- 
tended image sequences. In B. Buxton and Cipolla R., editors, ro t uro 

p n on r n on o put r ision LN 6 ri , pages 683-695. 

Springer- Verlag, 1996. 

[6] C. M. Bishop. N ur N t orks or tt rn o nition. Clarendon Press, Oxford, 
1995. 

[7] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): The 
general theory and its analytical extensions, s o trik , 52(3):345-370, 1987. 

[8] C. Chatfield. Model uncertainty, data mining and statistical inference. 

t tist o , 158:419-466, 1995. 

[9] G. Csurka, C. Zeller, Z. Zhang, and O. Faugeras. Characterizing the uncertainty 

of the fundamental matrix. , 68(l):18-36, 1996. 

[10] M. DeGroot. pti t tisti isions. McGraw-Hill, 1970. 

[11] D. Draper. Assessment and propagation of model uncertainty (with discussion). 

ourn 0 t o t tisti o i t s ri s , 57:45-97, 1995. 

[12] B. Efron and R.J. Tibshirani. n ntro u tion to t ootstr p. Chapman and 
Hall, London, UK, 1993. 

[13] O.D. Faugeras. What can be seen in three dimensions with an uncalibrated stereo 

rig? In G. Sandini, editor, ro 2n urop n on r n on o put r ision 
LN nt r rit Li ur , pages 563-578. Springer-Verlag, 1992. 

[14] R. A. Fisher. Uncertain inference, ro r rts n i n s, 71:245- 

258, 1936. 




300 Philip H.S. Torr 



[15] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. si n t n sis. 
Chapman k. Hall, New York, 1995. 

[16] P. E. Cill, W. Murray, and M. H. Wright. r ti pti i tion. Academic 
Press, 1981. 

[17] J. P. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. o ust 

t tisti s n ppro s on nflu n un tions. Wiley, New York, 1986. 

[18] C. Harris. Structure-from-motion under orthographic projection. In O. Faugeras, 
editor, ro st urop n on r n on o put r ision LN 21, pages 
118-128. Springer- Verlag, 1990. 

[19] R. I. Hartley. Estimation of relative camera positions for uncalibrated cameras. In 
C. Sandini, editor, ro 2n urop n on r n on o put r ision LN 

nt r rit Li ur , pages 579-87. Springer-Verlag, 1992. 

[20] R. I. Hartley and P. Sturm. Triangulation. In n n n rst n in 

orks op, pages 957-966, 1994. 

[21] U. Hjorth. On model selection in the computer age. t tist nn n , 23:101- 
115, 1989. 

[22] J. S. Hodges. Uncertainty, policy analysis and statistics (with discussion), t tis 
ti in, 2:259-291, 1987. 

[23] P. J. Huber, o ust t tisti s. John Wiley and Sons, 1981. 

[24] M. Irani, P. Anandan, and D. Weinshall. From reference frames to reference 
planes: Multi-view parallax geometry and its applications. In H. Burkhardt and 
B. Neumann, editors, ro t urop n on r n on o put r ision LN 

6 r i ur , pages 829-846. Springer-Verlag, 1998. 

[25] H. Jeffreys. or o ro i it . Clarendon Press, Oxford, third edition, 1961. 

[26] K. Kanatani. t tisti pti i tion or o tri o put tion or n 

r ti . Elsevier Science, Amsterdam, 1996. 

[27] R. E. Kass and A. E. Raftery. Bayes factors, ourn o t ri n t tisti 

sso i tion, 90:733-795, 1995. 

[28] M. Kendall and A. Stuart. n or o t tisti s. Charles Griffin 

and Company, London, 1983. 

[29] A.N. Kolmogorov. Three approaches to the quantitative definition of information. 

ro so n or tion r ns ission, 1:4-7, 1965. 

[30] E. E. Learner, p i tion s r s o in r n it non p ri nt t . 
Wiley, New York, 1978. 

[31] I. J. Leontaritis and S. A. Billings. Model selection and validation methods for 

non-linear systems. N N L, 45(1):311-341, 1987. 

[32] M. Li and P. Vitanyi. n intro u tion to o o oro o p it n its pp i 
tions. Springer-Verlag, 1997. 

[33] D. V. Lindley. Approximate Bayesian methods. In J. M. Bernardo, M. H. De- 
Groot, D. V. Lindley, and A. F. M. Smith, editors, si n t tisti s, pages 
223-237, Valencia, 1980. Valencia University Press. 

[34] D. Madigan and A. E. Raftery. Model selection and accounting for model un- 
certainty in graphical models using Occam’s window. ourn o t ri n 

t tisti sso i tion, 89:1535-1546, 1994. 

[35] G.I. McLachlan and K. Basford. i tur o s in r n n pp i tions to 

ust rin . Marcel Dekker. New York, 1988. 

[36] A. J. Miller. Selection of subsets of regression variables (with discussion), ourn 
0 t o t tisti o i t ( ri s ), 147:389-425, 1984. 

[37] B. R. Moulton. A Bayesian-approach to regression selection and estimation with 

application to a price-index for radio services, ourn o ono tri s, 49:169- 

193, 1991. 




Model Selection for Two View Geometry: A Review 301 



[38] J. Mundy and A. Zisserman. o tri n ri n in o put r ision. MIT 
press, 1992. 

[39] J. Pearl. On the connection between the complexity and credibility of inferred 
models, nt nr st s, 4:255-264, 1978. 

[40] K. Pearson. On lines and planes of closest fit to systems of points in space. i os 

r 6, 2:559, 1901. 

[41] B. D. Ripley. tt rn r o nition n n ur n t orks. Cambridge University 
Press, Cambridge, 1996. 

[42] J. Rissanen. Modeling by shortest data description. uto ti , 14:465-471, 
1978. 

[43] P. J. Rousseeuw. o ust r ssion n utir t tion. Wiley, New York, 
1987. 

[44] B. Russell, istor o st rn i osop . Routledge, 1961. 

[45] K. Schittowski. NLQPL: A FORTRAN-subroutine solving constrained nonlinear 
programming problems. nn s o p r tions s r , 5:485-500, 1985. 

[46] G. Schwarz. Estimating dimension of a model. nn t t , 6:461-464, 1978. 

[47] G.A.F. Wild C. J. Seber. Non Lin r r ssion. Wiley, New York, 1989. 

[48] R. Solomonoff. A formal theory of inductive inference i. n or tion n ontro , 
7:1-22, 1964. 

[49] M. Stone. An asymptotic equivalence of choice of model by cross-validation and 
Akaike’s criterion. o t tist o , 39:44-47, 1977. 

[50] P. H. S. Torr. utir t tion n otion nt tion. PhD thesis, Dept, of 

Engineering Science, University of Oxford, 1995. 

[51] P. H. S. Torr. Geometric motion segmentation and model selection. In J. Lasenby, 
A. Zisserman, R. Cipolla, and H. Longuet-Higgins, editors, i osop i r ns 

tions o t o o i t , pages 1321-1340. Roy Soc, 1998. 

[52] P. H. S. Torr, A. FitzGibbon, and A. Zisserman. Maintaining multiple motion 
model hypotheses through many views to recover matching and structure. In 
U Desai, editor, 6, pages 485-492. Narosa Publishing House, 1998. 

[53] P. H. S. Torr and D. W. Murray. The development and comparison of robust 
methods for estimating the fundamental matrix, nt oum o o put r ision, 
24(3):271-300, 1997. 

[54] P. H. S. Torr and D. W. Murray. The development and comparison of robust 
methods for estimating the fundamental matrix. , 24(3):271-300, 1997. 

[55] P. H. S. Torr and A. Zisserman. Concerning bayesian motion segmentation, model 

averaging, matching and the trifocal tensor. In H. Burkharddt and B. Neumann, 
editors, 9 o , pages 511-528. Springer, 1998. 

[56] P. H. S. Torr and A. Zisserman. Robust computation and parametrization of mul- 
tiple view relations. In U Desai, editor, 6, pages 727-732. Narosa Publishing 

House, 1998. 

[57] P. H. S. Torr, A Zisserman, and S. Maybank. Robust detection of degenerate 

configurations for the fundamental matrix. , 71(3):312-333, 1998. 

[58] C.S. Wallace and D.M. Boulton. An information measure for classification. o 
put r oum , ll(2):195-209, 1968. 

[59] C.S. Wallace and P.R. Freeman. Estimation and inference by compact coding. 

t tist o , 49(3):240-265, 1987. 

[60] Z. Zhang. Determining the epipolar geometry and its uncertainty: A review. 

, 27(2):161-195, 1997. 




Finding Objects by Gronping Primitives 



vi orsyth ohn on n rg y off 



Computer Science Division, U.C. Berkeley, Berkeley, CA 94720, USA 
daf , iof f e,haddon(§cs .berkeley . edu, 
bttp : //www. cs .berkeley . edu/ ~daf , ioff e .haddon 



Abstract. Digital library applications require very general object recog- 
nition techniques. We describe an object recognition strategy that op- 
erates by grouping together image primitives in increasingly distinctive 
collections. Once a sufficiently large group has been found, we declare 
that an object is present. We demonstrate this method on applications 
such as finding unclothed people in general images and finding horses 
in general images. Finding clothed people is difficult, because the vari- 
ation in colour and texture on the surface of clothing means that it is 
hard to find regions of clothing in the image. We show that our strategy 
can be used to find clothing by marking the distinctive shading patterns 
associated with folds in clothing, and then grouping these patterns. 



1 Background 

v r 1 typi 1 oil tions ont ining ov r t n million im g s r list in 6 . 
h r is n xt nsiv lit r tur on o t ining im g s rom 1 rg oil tions using 

tur s omput rom th whol im g in in ing olour histogr ms t xtur 

m sur s n sh p m sur s; signifi nt p p rs in in 9 13 16 21 24 2 
27 30 31 36 37 38 39 42 . 

ow V r in th most ompr h nsiv fi 1 stu y o us g pr ti s ( p p r 
y ns r 6 surv ying th us o th ulton uts h oil tion) th r is 

1 r us r pr r n or s r hing th s oil tions on im g s m nti s; typi 1 

u ri s o s rv r ov rwh Imingly ori nt t ow r o j t 1 ss s ( inos urs” 
p. 40 himp nz t p rty rly” p. 41) or inst n s ( rry om ” 

p. 44 w r th g sti ul ting” p. 4). ni Is rh tool woul 
uit g n r 1 r ognition syst m th t oul pt ui kly n sily to th 

typ s o o j ts sought y us r. uil ing su h tool r uir s mu h mor 
sophist! t un rst n ing o th pro ss o r ognition th n urr ntly xists. 

j t r ognition will not ompr h nsiv ly solv in th or s 1 u 

tur . olutions th t r goo nough to us ul or som s s in ppli tions 

r lik ly how v r. u rying im g oil tions is p rti ul rly goo ppli 
tion us in m ny s s no oth r u ry m h nism is v il 1 — th r is 

no prosp t o s r hing 11 th photogr phs y h n . urth rmor us rs r 
typi lly h ppy with low r 11 u ri s in t th output o high r 11 s r h 

or h r si nt” o 1 rg n ws oil tion woul unus 1 or most p 

pli tion purpos s. his propos 1 o us s on r s th t orm signifi nt su s t 
o th s u ri s wh r us ul tools n r son ly xp t . 
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is ussing r ognition r uir s r sp ting istin tion tw n two impor 
t nt n su tly iff r nt pro 1 ms finding wh r th im g ompon nts th t 
r suit rom singl o j t r oil t tog th r; n naming wh r th p rti 
ul r n m o singl isol t o j t is t rmin . in ing is not w 11 fin 

us o j ts r not w 11 fin — or x mpl woul on r g r th im g 

ompon nts orr spon ing ton rorny ssprtojtstht ompris 
or o th s ompon nts long tog th r s p rt o singl in issolu 1 

o j t? 



2 Primitives, Segmentation, and Implicit Representations 

ritings on o j t r ognition h v t n to on ntr t on n ming pro 1 ms. 
or som typ sooj tors n fin ing n voi y nit simpl t h 
ni us. or X mpl or sm 11 num rs o g om tri lly x t o j t mo Is 

s r h is ff tiv 7 14 18 22 26 28 29 34 40 ; n or isol t o j ts 

fin ing is irr 1 v nt. 

ow V r in m ny ppli tions fin ing is n import nt ompon nt o th 
pro lm;otnthnmo noj tisr uir only t v ry limit 1 v 1 o 
t il ( p rson” ig t” t .). hil n ming is not n sy pro 1 m uit 

goo solutions pp r possi 1 with xt nsions o urr nt pos s t hni u s. 

hr r s V r 1 r sons fin ing is v ry ifh ult n poorly un rstoo . in 
ing is ss nti lly s gm nt tion writ 1 rg using g n ri us — lik oh r n in 
olour n t xtur us y urr nt work on s gm nt tion — initi lly n high 
1 V 1 knowl g 1 t r to o t in regions that should be recognised together, ow 
V r i ing whi h its o th im g long tog th r n shoul r ognis 
tog th r r uir s knowl g o o j t prop rti s. s r suit fin ing involv s 

ploying o j t knowl g to ir t n gui s gm nt tion — ut how is th 
right pi o knowl g to us in th right pi ? n wish s to r ogniz 

o j ts t 1 ss 1 V 1 in p n nt o g om tri t il so th t fin ing Igorithms 

shoul p 1 o abstraction, or x mpl most u rup s h v roughly 

th s m o y s gm nts in roughly th s m pi — goo fin ing Igorithms 

woul xploit this t or s y m suring th istri ution o mus ul tur on 

h s gm nt or th num r o h irs on n r. in lly s nsi 1 ppro h to 

fin ing shoul us r pr s nt tions th t r ro ust to th ff ts o pose n o 

internal degrees of freedom su h s joints. 

w us th wor primitiv mor loos ly to m n tur or ss m ly 
o tur s th t h s onstr in stylis pp r n th n r pr s nt tion 
s roun primitiv s t m ny 1 v Is h s th gr t v nt g th t t h 

st g o fin ing progr m n know wh t it is looking or. or x mpl hors s 

n r pr s nt ( ru ly!) s ss m li s o hi olour ylin rs — this 

r suits in fin ing pro ss th t first looks or hi lik r gions; th n fin s g 

points n us s g om tri 1 onstr ints to ss m 1 s ts o g points th t 

oul h V om rom ylin rs; n fin lly r sons out th onfigur tion 

o th ylin rs. t h st g th r r w It rn tiv s to hoos rom whi h 

m ns th s r his fh i nt; n whil h in ivi u 1 1 st is w k th oil tiv 
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o t sts in s u n n uit pow r ul. h hoi o primitiv s n th or r 

n n tur o ss m ly routin s tog th r orm n implicit representation — 

r pr s nt tion o n o j t s fin ing pro ss whi h un tions s sour o 
top own knowl g . 

now h V som insight into wh t shoul primitiv . rimitiv s shoul 

h V stereotyped appearance, h most us ul orm o primitiv is on wh r 
it is possi 1 to t st n ss m ly o im g tur s n s y wh th r it is lik ly to 
h V om rom primitiv or not. or x mpl it is known th t su h t sts r 
sy or sur s o r volution str ight homog n ous g n r lis ylin rs n 1 

sur s n ylin rs 32 33 43 . s r suit it is possi 1 to s gm nt im g 

r gions th t r lik ly to orr spon to su h sur s without knowing to what 
objeet they belong^ . s on tur o us ul primitiv isth t it is significant, 

or X mpl ylin r is signifi nt prop rty us m ny o j ts r t 
ru 1 V 1 m o ylin rs. thir us ul prop rty is robustness; ylin ri 1 

primitiv s r uit sy to fin v n in th pr s n o som orm tions. h s 
prop rti s m n th t fin ing o j ts th t r ss m li s o primitiv s ss nti lly 

involv s fin ing th primitiv s n th n r soning out th ir ss m ly. s w 

h V in i t pr vious work h s typi lly on ntr t on p rsing tiviti s 

(whi h ssum th t fin ing h s Ir y o urr ); this propos 1 on ntr t s on 

fin ing. 

2.1 Body Plans - Interim Results on Implicit Representations 

n tur 1 impli it r pr s nt tion to us or p opl n m ny nim Is is body 

plan — s u n o grouping st g s onstru t to mirror th 1 yout o o y 
s gm nts. h s grouping st g s ss m 1 im g ompon nts th t oul orr 

spon to ppropri t o y s gm nts or oth r ompon nts ( s in figur 1 whi h 

shows th pi n us s n impli it r pr s nt tion o hors ). ving s u n 

o st g s m ns th pro ss is ffi i nt th pro ss n st rt with h king 
in ivi u 1 s gm nts n mov to h king multi s gm nt groups so th t not 
11 groups o our (or how v r m ny or th r 1 v nt o y pi n) s gm nts r 
pr s nt to th fin 1 1 ssifi r. h v on xt nsiv xp rim nts with two 

s p r t syst ms th t us th s m stru tur 

— mgsrmsk orr gions o ppropri t olour n t xtur . 

— oughly ylin ri 1 r gions o ppropri t olour n t xtur r i ntifi . 

— ss m li s o r gions r orm n t st g inst s u n o pr i t s. 

h first X mpl i ntifi s pi tur s ont ining p opl w ring littl or no 

lothing to fin ss th issu o v ri tions o pp r n o lothing. his progr m 

h s n t st on n usu lly 1 rg n unusu lly iv rs s t o im g s; on t st 

oil tion o 6 im g s known to ont in lightly 1 p opl n 4289 ontrol 

^ While current techniques for finding generalised cylinders are fragile, because they 
winnow large collections of edges to find subsets with particular geometric properties 
and so are overwhelmed by images of textured objects, the principle remains. We 
indicate an attack on this difficulty below. 
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Fig. 1. The body plan used for horses. Eaeh circle represents a classifier, with 
an icon indicating the appearance of the assembly. An arrow indicates that the 
classifier at the arrowhead uses segments passed by the classifier at the tail. The 
topology was given in advance. The classifiers were then trained using image data 
from a total of 38 images of horses. 



im g s with wi ly v rying ont nt on tuning o th progr m m rk 241 t st 

im g s n 182 ontrol im g s (th p r orm n o v rious iff r nt tunings is 

in i t in figur 3; mor t il in orm tion pp rs in 12 10). hr 11 is 

omp r 1 with nil t xt o um nt r 11 3 4 3 (whi h is surprisingly goo 
or so str t n o j t r ognition u ry) n th r t o Is positiv s is 

s tis torily low. n this s th r pr s nt tion w s ntir ly uilt y h n . 

h s on X mpl us r pr s nt tion whos om in tori 1 stru tur — 

th or r in whi h t sts w r ppli — w s uilt y h n ut wh r th t sts 

w r 1 rn rom t . his progr mi ntifi pi tur s ont ining hors s n 
is s ri in gr t r t il in 11 . sts us 100 im g s ont ining hors s 
n 1086 ontrol im g s with wi ly v rying ont nt. h g om tri pro ss 
m k s signifi nt iff r nt s figur 2 illustr t s. h p r orm n o v rious 
iff r nt onfigur tions is shown in figur 3. or v rsion ” i on stim t s 
p r orm n omitting im g s us in tr ining n im g s or whi h th s gm nt 

fin ing pro ss ils th r 11 is 1 — i. . out 1 o th im g s ont ining 

hors s r m rk — n ontrol im g s r m rk t th r t o pproxim t ly 
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Fig. 2. Typical images with large quantities of hide-like pixels (white pixels are 
not hide-like; others are hide-like) that are classified as not containing horses, 
because there is no geometric configuration present. While the test of eolour and 
texture is helpful, the geometric test is important, too, as the results in figure 3 
suggest. In particular, the faet that a horse is brown is not nearly as distinetive as 
the fact that it is brown, made of cylinders, and these cylinders have a partieular 
set of possible arrangements. 



0.6 . n our t st oil tion this tr nsl t s to 11 im g s o hors s m rk n 

4 ontrol im g s m rk ^ . 

in ing using o y pi ns h s n shown to uit ff tiv or sp i 1 
s s in uit gnrls ns. tisrl tiv ly ins nsitiv to h ng s in sp til. 
t is uit ro ust to th r 1 tiv ly poor s gm nt tions th t our rit ri off r 
us it is uit ff tiv in ling with nuis n s gm nts — in th hors 
t sts th V r g num r o our s gm nt groups w s 2, 00,000 whi h is n 

V r g o orty s gm nts p r im g . Non th 1 ss th pro ss s ri ov is 
ru it is too p n nt on olour n t xtur rit ri or rly s gm nt tion; 

th 1 rning pro ss is s nt (hum ns) or xtr m ly simpl (hors s); n th r 
is on r ognis r p r 1 ss. 

3 Learning Assembly Processes from Data 

h V n stu ying pro ss s or 1 rning to ss m 1 primitiv s. h r og 

nition pro ss s s ri ov h v strong ompon nt o orr spon n ; in 

^ These figures are not 15 and 7, because of the omission of training images and images 
where the segment finder failed in estimating performance. 
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Fig. 3. The response ratio, (percent incoming test images marked/percent in- 
coming control images marked), plotted against the percentage of test images 
marked, for various configurations of the two finding programs. Data for the 
nude human finder appears on the top, for the horse finder on the right. Capital 
letters indicate the performance of the complete system of skin/hide filter and 
geometrical grouper, and lower case letters indicate the performance of the geo- 
metrical grouper alone. The label “skin” (resp “hide”) indicates the selectivity of 
using skin (resp hide) alone as a criterion. For the human finder, the parameter 
varied is the type of group required to declare a human is present — the trend 
is that more complex groups display higher selectivity and lower recall. For the 
horse finder, the parameter being varied is the maximum number of that will be 
considered. 



p rti ul r w r pruning s t o orr spon ns tw n im g s gm nts n 

o y s gm nt 1 Is y t sting or kin m ti pi usi ility. 

h s r h or pt 1 orr spon ns n m fR i nt y using 

projected classifiers whi h prun 1 lings using th prop rti so sm 11 r su 
1 lings ( s in 18 who us m nu lly t rmin oun s n o not 1 rn th 
t sts). iv n 1 ssifi r C whi h is un tion o s t o tur s whos v lu s 

p n on s gm nts with 1 Is in th s t L {/i • • • Im} th proj t 1 ssifi r 

is un tiono o 11 thos tur s th t p n only on th s gm nts with 
1 Is L {^1 • ■ .Ik}, n p rti ul r ) > 0 i th r is som xt nsion 

L o L su h th t C{L) > 0. his rit rion orr spon s to insisting th t groups 
shoul p ss int rm i t 1 ssifi rs i with appropriate segments attached th y 

p ss fin 1 1 ssifi r. 

h onv rs n not tru th tur v lu s r uir to ring proj t 
point insi th positiv volum o C m y not r liz with ny 1 ling o th 

urr nt s t o s gm nts 1, . . . , or proj t 1 ssifi r to us ul it must 
sy to omput th proj tion n it must ff tiv in r j ting 1 lings 

t n rly st g . h s r strong r uir m nts whi h r not s tisfi y most 
goo 1 ssifi rs; or x mpl in our xp ri n support v tor m hin with 

positiv finit u r ti k rn 1 proj ts sily ut typi lly yi 1 s unr stri tiv 

proj t 1 ssifi rs. 
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h V n using n xis lign oun ing ox with oun si rn rom 

oil tion o positiv 1 llings or goo first s p r tion n th n using 

oost V rsion o w k 1 ssifi r th t splits th tur sp on singl 
tur V lu ( s in 1 ). his yi 1 s 1 ssifi r th t proj ts p rti ul rly w 11 
n Hows Inn fh i nt Igorithms or oniputing proj t 1 ssifi rs n 
xp n ing s ts o 1 Is (s 23 ). 

h s gm nt fin r m y fin ith r 1 or 2 s gm nts or h lim p n ing 
on wh th r it is nt or str ight; us th pruning is so ff tiv w n 

How s gm nts to rok n into two u 1 h Iv s 1 ngthwis oth o whi h r 

t st . 

3.1 Results 

h tr ining s t in lu 79 im g s without p opl sit r n omly rom 
th Corel t s n 274 im g s h with singl p rson on uni orm 
kgroun . h im g s with p opl h v n s nn rom ooks o hum n 
mo Is 41 . 11 s gm nts in th t st im g s w r r port ; in th ontrol im g s 

only s gm nts whos int rior orr spon to hum n skin in olour n t xtur 

w r r port . ontrol im g s oth or th tr ining n or th t st s t w r 

hos n so th t 11 h t 1 st 30 o th ir pix Is simil r to hum n skin in 

olour n t xtur . his giv s mor r Hsti t st o th syst m p r orm n 
y X lu ing r gions th t r o viously not hum n nr u s th num r o 
s gm nts in th ontrol im g s to th s m or rom gnitu s thos in th 
t st im g s. 

h mo Is r 11 w ring ith r swim suits or no loth s oth rwis s gm nt 
fin ing ils; it is n op n pro 1 m to s gm nt p opl w ring loos lothing. 

h r is wi v ri tion in th pos s o th tr ining x mpl s Ithough Hoy 

s gm nts r visi 1 . h s ts o s gm nts orr spon ing to p opl w r th n 

h n 1 1 . th 274 im g s with p opl s gm nts or h o y p rt w r 

oun in 193 im g s. h r m ining 81 r suit in in ompl t onfigur tions 

whi h oul still us or omputing th oun ing ox us to o t in first 

s p r tion. in w ssum th t i onfigur tion looks lik p rson th n its 
mirror im g woul too w ou 1 th num r o o y onfigur tions y flipping 
h on out V rti 1 xis. h oun ing ox is th n omput rom th 
r suiting 48 points in th tur sp without looking t th im g s without 
p opl . 

h oost 1 ssifi r w s tr in to s p r t two 1 ss s th 193 x 2 386 

points orr spon ing to o y onfigur tions n 60727 points th t i not or 

r spon to p opl ut 1 y in th oun ing ox o t in y using th oun ing 

ox 1 ssifi r to in r m nt Hy uil 1 lings or th im g s with no p opl . 

1178 synth ti positiv onfigur tions o t in y r n omly s 1 ting 
h lim n th torso rom on o th 386 r 1 im g s o o y onfigur tions 

(whi h w r rot t n s 1 so th torso positions w r th s m in Ho 

th m) to giv n ff to joining lim s n torsos rom iff r nt im g s r th r 

lik hil r ns’ Hip ooks. m rk ly th oost 1 ssifi r 1 ssifi h o th 
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Features 


# test images 


# control images 


False negatives 


False positives 


367 


120 


28 


37% 


4% 


567 


120 


86 


49% 


10 % 



Table 1. Number of images of people and without people processed by the clas- 
sifiers with 367 and 567 features, compared with false negative (images with a 
person where no body configuration was found) and false positive (images with 
no people where a person was detected) rates. 



r 1 t points orr tly ut mis 1 ssifi 976 out o th 1178 synth ti onfig 

ur tions s n g tiv ; th synth ti x mpl s w r un xp t ly mor simil r to 

th n g tiv X mpl s th n th r lx mpl s w r . 

h tst tstwssprt rom th tr ining s t n in In 120 im 
g s with p rson on uni orm kgroun n v rying num rs o ontrol 

im g s r port in t 11. r port r suits or two 1 ssifi rs on using 67 

tur s n th oth r using su s t o 367 o thos tur s. 1 1 shows th 
Is positiv n Is n g tiv r t s hi v or h o th two 1 ssifi rs. y 

m rking 1 o t st im g s n only 10 o ontrol im g s th 1 ssifi r using 

67 tur s omp r s xtr m ly vour ly with th t o 8 whi h m rk 4 

o t st im g s n 38 o ontrol im g s using h n tun t sts to orm groups 

0 our s gm nts. n o th 9 im g s wh r th r w s Is n g tiv 

s gm nt orr spon ing to o y p rt w s miss y th s gm nt fin r m n 
ing th t th ov r 11 syst m p r orm n signifi ntly un rst t s th 1 ssifi r 

p r orm n . h r r w signs o ov rfitting pro ly us th tur s 

r highly r un nt. sing th 1 rg r s t o tur s m k s 1 lling st r ( y 

tor o out fiv ) us mor onfigur tions r r j t rli r. 

4 Shading Primitives, Shape Representations, and 
Clothing 

in ing loth p opl is r mor su tl pro 1 m th n fin ing n k p opl 
us th V ri tion in olour t xtur n p tt rn o lothing ts olour 
s gm nt tion str t gy. lothing o s h v istin tiv prop rti s th p tt rns 

orm y ol s on lothing pp r to off r us to th onfigur tion o th 

p rson un rn th ( s ny t xt ook on figur r wing will illustr t ). h s 

01 s h V uit istin tiv sh ing p tt rns 19 whi h r omin nt tur 

o th sh ing fi 1 o p rson 1 in loos g rm nt us It hough th y 

r g om tri lly sm 11 th sur norm 1 h ng s signifi ntly t ol . ol s 

r st n lys using th th ory o u kling n ris rom v ri ty o us s 

in lu ing x ss m t ri 1 s in th so ull skirt n str ss s on g rm nt 
us y o y onfigur tions. ol s pp r to th singl most istin tiv 

r li 1 n g n r 1 visu 1 u to th onfigur tion o p rson r ss in otton 

g rm nt. 
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4.1 Grouping Folds Using a Simple Bnckling Model 

rm nts n mo 11 si sti sh 11s Rowing r th r simpl pr i tions 

0 th p tt rn o ol s using th on K rm n onn 11 u tion or lin ris 

V rsion o th t u tion. his is known to u ions sour o pr i tions 

o u kling or ut th r u n i s o th ig n un tions — whi h giv th 
u kl solutions — r pt s ir pr i tions o th u kling mo or 
th s s s ri (this is th topi o hug lit r tur intro u in ). 

h ig n un tions How us to pr i t th t g rm nts u kling in ompr ssion or 

torsion will ispl y long n rly str ight ol s th t r n rly p r 11 1 n n rly 

V nly sp . h s ol s will pproxim t ly p rp n i ul r to th ir tion 
o ompr ssion n will in i t th ir tion o th torsion, h num r o 

01 s p n s on t nsion in th g rm nt n is h r to pr i t.^ or torsion 

r son 1 stim t s o g rm nt’s siz yi 1 on th or r o fiv visi 1 ol s. 

s figur s 4 n in i t th s pr i tions r ur t nough to riv 
s gm nt tion pro ss. 

pply th simpl ol fin r s ri in 20 to th im g t tw Iv 

iff r nt ori nt tions. sing th s tw Iv r spons m ps w us non m ximum 

supr ssion to fin th ntr o th ol n ollow this m ximum long th 

ir tion o m ximum r spons to link 11 points orr spon ing to singl ol . 
h linking pro ss r ks sh rp orn rs y onsi ring th prim ry ir tion 
o th pr ing points long th ol . 

t r fin ing 11 o th ol s in th im g th n xt st p is to fin p irs whi h 

r pproxim t ly p r 11 1 n in th s m p rt o th im g . th proj tions 

0 th two ol s onto th ir v r g ir tion r isjoint th y r onsi r to 

long to iff r nt p rts o th im g . 

rom th th ory w xp t th t multipl ol s will t r gul rly sp 
int rv Is. hus w look or p irs whi h h v on ommon ol n onsist nt 

s p r tions. ( h s p r tions shoul ith r th s m or on shoul ou 1 

th oth r — i singl ol g ts ropp w o not w nt to ignor th ntir 

p tt rn.) h s p r tion tw n ol s is r uir to 1 ss th n th m ximum 

1 ngth o th ol s. in lly som o th s groups n urth r om in i th 

groups h V Imost th s m s t o ol s. 

h progr m typi lly xtr ts 10 2 groups o ol s rom n im g . igur 4 

shows on im g with thr typi 1 groups, h group in 4( ) 1 rly orr spon s 

to th m jor ol s ross th torso in th im g . his is in t s gm nt tion 

o th im g into oh r nt r gions onsisting o possi 1 pi so loth, h 

r gion ov r y th ol s in ( ) is most o th torso o th figur n sugg sts 
lik ly n i t or onsi r tion s torso, hr r oth r groups s w 11 
su h s ( ) th V n ti n lin s n ( ) n li s v rsion o ( ) ut th s 

xtr s gm nts r sily It with y high r 1 v 1 pro ss s. 

nyimgomnm s ns will h v num r o str ight p r 11 1 lin s 

whi h m y h V simil r sh ing to ol s (s or x mpl figur 6). hil this 

® This can be demonstrated with a simple experiment. Wearing a loose but tucked-in 
T-shirt, bend forward at the waist; the shirt hangs in a single fold. Now pull the 
T-shirt taut against your abdomen and bend forward; many narrow folds form. 




Finding Objects by Grouping Primitives 311 





(c) (d) 



Fig. 4. Results of a segmenter that obtains regions by grouping folds that satisfy 
the qualitative predictions of the linear buckling theory, (a) An image showing 
folds corresponding to torsional buckling. (b,c,d) Three groups of folds found by 
our program. The group in (b) is, in essence the torso; it contains the major folds 
across the torso, and can be used to represent the torso. An edge detector could 
not extract the outline points of the torso from this image, since the Venetian 
blinds would result in a mess of edges. The group of fold responses in (c) is due 
to the Venetian blinds in the background. Such a large set of parallel lines is 
unlikely to come from a picture of a torso, since it would require the torso to be 
unrealistically long, (d) A group that is an aliased version of the group in (c). 
Each group has quite high level semantics for segmenter output; in particular, 
groups represent image regions that could be clothing. 



m y initi lly int rpr t s groups o ol s — h n s lothing — high r 1 v 1 
r soiling shoul n 1 us to r j t th s groups s oming rom som thing 
oth r th n ol s in loth. 



4.2 Grouping Folds by Sampling 



n It rn tiv ppro h is to o t in groups whi h r s mpl s rom post rior 
on groups giv n im g t . his ppro h h s th virtu th t w o not n 
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(e) (f) (g) (h) 



Fig. 5. Further examples of segmentations produced by our grouping process. 
The figures show groups of fold responses, for the torsional (b,f) and axial (d,h) 
cases. In some eases, more than one group should be fused to get the final extent 
of the torso — these groups are separated by circles in the image. In each case, 
there are a series of between 10 and 25 other groups, representing either aliasing 
effects, the Venetian blinds, or other accidental events. Each group could be a 
region of clothing; more high-level information is required to tell which is and 
which is not. 




Fig. 6. There are parallel folds that appear without clothing, too; (a) An image 
of an architectural curiosity, (b) One of four groups of folds found in the image. 
It is certainly expected that in images of man-made scenes, there will be a large 
number of nearly-parallel lines, which may be interpreted as groups of folds. Other 
cues should allow us to determine that this is not in fact clothing. 



to om up with t il physi 1 mo log rm nt u kling — pro ss 

ompli t y loth nisotropy t . simpl lik lihoo mo 1 n fitt to 

groups in r 1 im g s inst 
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s ri h group o ol s y oor in t syst m n s ri s o v ri 
1 s whi h s ri th s 1 o th ol s th ir ngl n th ir lo tion with 

r sp t to oor in t syst m. Iso in lu th h ng in ngl tw n 

j nt ol s (this n 1 s us to s ri st r sh p ol s). y x mining 

num r o groups in r 1 im g s w stim t pro ility istri ution on th 

p r m t rs o th oor in t syst m. his Hows us to s ri how lik ly 
group with thos p r m t rs is. Iso stim t th pro ility istri ution 
or in ivi u 1 ol s within group. 

h ol s r group y running r v rsi 1 jump rkov h in ont 

rlo Igorithm ( s in 17 . ol h s high lik lihoo o longing to 
p rti ul r group n ssignm nt o th ol to th t group shoul irly st 1 . 
n oth r wor s it will h v high pro ility in th st tion ry istri ution. h 
ssignm nts whi h pp r most r u ntly ov r 1 rg num r o it r tions r 
t k n to th orr t grouping, ropos 1 mov s or this group r r 

1. n w group, wo ol s whi h h v not pr viously n ssign to 
noth r group r om in to orm n w group. 

2. It two ol group. 

3. h ng th p r m t rs o group. 

4. ol to group, n un ssign ol is ssign to n xisting group, 

mov ol rom group 

6. h ng th group o ol . h ng th group ssignm nt o ol . 

t r s V r 1 thous n it r tions w o s rv th t th sp n s r 1 

tiv ly high proportion o its tim in rt in st t s. t k th grouping in 
th most popul r st t to th st grouping o ol s or th im g s. igur 7 

shows n im g n th most popul r grouping o ol s. ( h low r 1 v 1 ol 

fin r is not y t ro ust nough to g n r t r li 1 ol s so th put tiv ol s 
hrwrmrk yhn.) rill groups r t k n to unit n th g 

0 th figur is 1 rg ly ignor s sir . 

4.3 Choosing Primitives and Building Representations 

lothing is n int r sting s us it is not o vious th t ol s r th right 
primitiv to us . his r is s th st n r iffi ult u stion th t ny th ory 
s on primitiv s must r ss — how o w t rmin wh t is to 

primitiv ? s possi 1 It rn tiv to our urr nt ol fin r w h v n 

stu ying m h nism or t rmining wh t shoul primitiv ollowing th 

1 so 12. otin Irgstoimgsor gions showing r gions o ol s 
t th s m ori nt tion ns 1 . h r is omp rison s t ont ining non ol s 

th t r not sy to istinguish rom th ol s using ru m tho s ( .g. lin r 
1 ssifi r on prin ip 1 ompon nts). s m sur m nts w us sp ti 1 r 1 tions 

tw n hit r outputs or r son 1 s t o Hit rs t v ri ty o s Is. 

t k uni orm s mpl so su im g s rom h s t . 

h t sk is now to xplor th stru tur o th lothing s t with r sp t 

to th non lothing s t. o this y s tting up ision tr ; h ision 




314 David Forsyth, John Iladdon, and Sergey Ioffe 




Fig. 7. The Markov Chain Monte Carlo method can be used to group folds to- 
gether. (a) The original image, (b) Folds marked by hand, but grouped automat- 
ically. This is the most popular grouping of the image, after 10,000 iterations. 
Note that parallel folds are grouped together, and that the outline of the figure is 
largely ignored. 



ft mpts to split th s t t th 1 using n ntropy rit rion. h m sur m nt 
us is th V lu o th output o on hit r t on point — th hoi o hit r 

n point is giv n y th ntropy rit rion. h ppro h n thought o s 

sup rvis 1 rning o s gm nt tion — w r tr ining ision tr to s p r t 
win ows sso i t with o j ts to rom thos th t r not. 

1 V is — tot 1 o tw Iv 1 V s in th urr nt xp rim nts 
r pr s nt tion t hi s primitiv . n p rti ul r 1 
s ri s o hit r outputs t s ri s o points; t hi w h v n stim t o th 
r unyoosrv tion o this p tt rn giv n lothing n giv n no lothing. 
h r m ining t sk is to postpro ss th s t o primitiv s to r mov tr nsl tion 1 
r un n i s. 



split to s V r 1 
n th n us th 
is fin y 



5 Conclusions 

or r ognition syst ms to pr ti lly us ul w n syst mo r pr s nt 
tion th t n h n 1 r son 1 1 v 1 o str tion n th t n support s g 
m nt tion rom uit g n r 1 kgroun s. h s r uir m nts strongly sugg st 
r pr s nt tions in t rms o r 1 tions tw n primitiv s. h v shown th t 

using simpl primitiv th t is o viously onv ni nt n us ul it is possi 1 to 
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Fig. 8. A representation of the deeision tree used to find fold primitives. Each 
leaf contains a few windows representative of image windows classified at that 
level; on the left, clothing, and on the right, non-clothing. Below each leaf is the 
number of clothing and non-clothing windows that arrived at that leaf, out of a 
total of 128 in each category. 110 clothing and 2 non-clothing windows arrive at 
one leaf, strongly suggesting this combination of filter outputs is an appropriate 
clothing primitive. 




Fig. 9. Folds in clothing result from buckling and have quite characteristic shad- 
ing and spatial properties, which are linked to the configuration of the person. 

(a) shows the probability that an image window centered at each point contains 
a clothing primitive, using automatically defined primitives sketched in figure 8; 

(b) shows lines of primitives linked together using an extremisation criterion. 
Note that edges are in general not marked, and that the process is insensitive to 
changes in albedo; these properties are a result of the learning process. 






316 David Forsyth, John Iladdon, and Sergey Ioffe 



uil r 1 tion 1 r pr s nt tions th t r uit ff tiv t fin ing n k p opl 
n hors s. urth rmor w h v shown th t grouping pro ss th t fin s su h 
ss m li s n 1 rn rom t . h s r pr s nt tions r ru i lly limit 

y th ru primitiv s us . 

rimitiv s n not just stylis sh p s. h stylis pp r n o of s 
in lothing m ns th t w n stu y th pp r n o lothing in r son ly 

ff tiv w y. h s r sh ing primitiv s. Ithough it is urr ntly ifii ult to 

know how to hoos primitiv s th pro 1 m pp rs to st tisti 1 on its 
t tisti 1 rit ri pp r to 1 to sugg st promising hoi s o sh ing 

primitiv s rom im g t . 
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Abstract. in ing n pp op i te et of fe tu e i n e enti 1 p o lem 
in the e ign of h pe e ognition y tem hi p pe ttempt to how 
th t fo e ognizing imple o je t with high h pe v i ility u h 

h n w itten h te it i po i le n even v nt geou to fee the 

y tem i e tly with minim lly p o e e im ge n to ely on le ning 

to ext t the ight et of fe tu e onvolution 1 eu 1 etwo k e 

hown to e p ti ul ly well uite to thi t k e 1 o how th t the e 
netwo k n e u e to e ognize multiple o je t without e ui ing 
expli it egment tion of the o je t f om thei u oun ing he e on 
p t of the p pe p e ent the ph n fo me etwo k mo el whi h 

exten the ppli ility of g ient- e le ning to y tem th t u e 

g ph to ep e ent fe tu e o je t n thei om in tion 

1 Learning the Right Features 

he most ommonly epte mo el of p ttern re ognition is ompose of 
segmenter whose role is to extr to je ts of interest from their kgroun 
h n - r fte feature extractor th t g thers relev nt inform tion from the in- 
put n elimin tes irrelev nt v ri ilities n classifier whi h tegorizes the 
resulting fe ture represent tions (gener lly ve tors or strings of sym ols) into 
tegories here re three m jor metho s for 1 ssifi tion template matching 
m t hes the fe ture represent tion to set of 1 ss tempi tes; generative meth- 
ods use pro ility ensity mo el for e h 1 ss n pi k the 1 ss with the 
highest likelihoo of gener ting the fe ture represent tion; discriminative models 
ompute is rimin nt fun tion th t ire tly pro u es s ore for e h 1 ss 

ener five n is rimin five mo els re often estim te (le rne ) from tr in- 

ing s mples n 11 of these ppro hes the over 11 perform n e of the system is 
1 rgely etermine y the u lity of the segmenter n the fe ture extr tor 
e use they re h n - r fte the segmenter n fe ture extr tor often rely 
on simplifying ssumptions out the input t n nr relyt keinto ount 
11 the V ri ility of the re 1 worl n i e 1 solution to this pro lem is to fee 
the entire system with minim lly pro esse inputs (eg r w” pixel im ges) n 
tr in it from t so s to minimize n over 11 loss fun tion (whi h m ximizes 
given perform n e me sure) Keeping the prepro essing to minimum ensures 
th t no unre listi ssumption is m e out the t nfortun tely th t Iso 
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nn e un et 1 



re uires to ome up with suit le le ruing r hite ture th t n h n le the 
high imension of the input (num er of pixels) the high egree of v ri ility 
ue to pose v ri tions or geometri istortions mong other things n the 
ne ess rily omplex non-line r rel tion etween the input n the output 

Gradient- Based Learning provi es fr mework in whi h to uil su h sys- 
tem he le rning m hine omputes fun tion F{Z^, W) where is 

the p-th input p ttern n W represents the olle tion of just le p r meters 
in the system he output ont ins s ores or pro ilities for e h tegory 
loss fun tion , F {W , Z^)) me sures the is rep n y etween 

the orre t” output for p ttern Z^ n the output pro u e y the system 

he ver ge loss fun tion Etrain{W) is the ver ge of the errors E^ over set of 
1 ele ex mples lie the tr ining set {Z^ , D^), ....{Z^ , D^) n the simplest 

setting the le rning pro lem onsists in fin ing the v lue of kF th t minimizes 

EtrainiW) 

king the loss fun tion differentiable with respe t to Vk ensures th t ef- 
fi ient gr ient- se non-line r optimiz tion metho s n e use to fin 
minimum o ensure glo 1 i erenti ility the system is uilt s fee -forw r 
network of mo ules n the simplest se e h mo ule omputes fun tion 
Xn Fn(Wn, Xn-i) where Xn is n o je t ( ve tor in the simplest se) 
representing the output of the mo ule is the ve tor of tun le (tr in le) 
p r meters in the mo ule ( su set of W) n is the mo ule’s input ( s 

well s the previous mo ule’s output) he input Xq to the first mo ule is the 
system’s input p ttern Z^ 

he m in i e of r ient- se Le rning whi h is simple extension of 
the well-known k-prop g tion neur 1 network le rning Igorithm is th t the 

o je tive fun tion n e efh iently minimize through gr ient es ent (or other 
more sophisti te non-line r optimiz tion metho s) e use the gr ient of 
E with respe t to VL n e effi iently ompute with kw r re urren e 
through the network of mo ules f the p rti 1 eriv tive of E^ with respe t to 
Xn is known then the p rti 1 eriv tives of E^ with respe t to Wn n Xn-i 
n e ompute using the following kw r re urren e 



dEP 

dW^ 

dEP 

dXn-i 



dFn 

dW 

dFn 

dX 



(Wn,Xn-l) 

(Wn,Xn-l) 



dEP 

dEP 



( 1 ) 



where ^^(kL„,X„_i) is the o i n of with respe t to kL ev lu te t 
the point (VL„,X„_i) n X„_i) is the o i n of with respe t 

to X he first e u tion omputes some terms of the gr ient of EP{W) while 
the se on e u tion prop g tes the p rti 1 gr ients kw r he i e n e 
trivi lly exten e to ny network of fun tion 1 mo ules ompletely rigorous 
eriv tion of the gr ient prop g tion pro e ure in the gener 1 se n e one 
using L gr nge fun tions 14 15 2 




je t e ognition with 



ient- 



e e nmg 
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2 Shape Recognition with Convolutional Neural 
Networks 

r ition 1 multi-1 yer neur 1 networks re spe i 1 se of the ove where the 

st tes Xn re fixe -size ve tors n where the mo ules re Item te 1 yers 

of m trix multipli tions (the weights) n omponent-wise sigmoi fun tions 
(the units) r ition 1 multil yer neur 1 nets where 11 the units in 1 yer re 
onne te to 11 the units in the next 1 yer n e use to re ognize r w (roughly 
size-norm lize n entere ) im ges ut there re pro lems 

irstly typi 1 im ges re 1 rge often with sever 1 hun re v ri les (pixels) 
fully- onne te network with s y 100 units in the first 1 yer woul Ire y 
ont in sever 1 10 000 weights Su h 1 rge num er of p r meters in re ses the 
p ity of the system n therefore re uires 1 rger tr ining set ut the m in 

efi ien y of unstru ture nets is th t they h ve no uilt-in inv ri n e with re- 

spe t to tr nsl tions s le or geometri istortions of the inputs m ges of 
o je ts n e pproxim tely size-norm lize n entere ut no su h prepro- 

essing n e perfe t his om ine with intrinsi within- 1 ss v ri ility will 
use V ri tions in the position of istin five fe tures in input o je ts n prin- 
iple fully- onne te network of sufii lent size oul le rn to pro u e outputs 
th t re inv ri nt with respe t to su h v ri tions owever le rning su h t sk 
woul pro ly result in multiple units with simil r weight p tterns positione 
t V rious lo tions in the input so s to ete t istin tive fe tures wherever they 
ppe r on the input Le rning these weight onfigur tions re uires very 1 rge 
num er of tr ining inst n es to over the sp e of possi le v ri tions n on- 
volution 1 networks es ri e elow the ro ustness to geometri istortions is 
utom ti lly o t ine y for ing the repli tion of weight onfigur tions ross 
sp e 

Se on ly efi ien y of fully- onne te r hite tures is th t the topology 
of the input is entirely ignore he input v ri les n e presente in ny 
(fixe ) or er without e ting the out ome of the tr ining n the ontr ry 
im ges h ve strong 2 lo 1 stru ture v ri les (pixels) th t re sp ti lly 
ne r y re highly orrel te Lo 1 orrel tions re the re sons for the well- 
known V nt ges of extr ting n om ining local fe tures efore re ognizing 
sp ti 1 or tempor 1 o je ts e use onfigur tions of neigh oring v ri les n 

e 1 ssifie into sm 11 num er of relev nt tegories (e g e ges orners ) 
Convolutional Networks for e the extr tion of lo 1 fe tures y restri ting the 
re eptive fiel s of hi en units to e lo 1 



2.1 Convolutional Networks 

onvolution 1 Networks om ine three r hite tur 1 i e s to ensure some egree 
of shift s le n istortion inv ri n e local receptive fields shared weights (or 
weight repli tion) n sp ti 1 sub-sampling typi 1 onvolution 1 network 
for re ognizing sh pes u e LeNet-5 is shown in figure 1 he input pi ne 
re eives im ges of o je ts th t re pproxim tely size-norm lize n entere 
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C3: f. maps 16@10x10 

ni ' feati irp mane 0/i-f 




Convolutions Subsampling Convoiutions Subsampiing Fuil connection 

Fig. 1. r hite ture of LeNet-5 onvolution 1 Neur 1 Network here for igits 
re ognition h pi ne is fe ture m p i e set of units whose weights re 
onstr ine to e i enti 1 



h unit in 1 yer re eives inputs from set of units lo te in sm 11 neigh- 
orhoo in the previous 1 yer he i e of onne ting units to lo 1 re eptive 

fiel s on the input goes k to the e rly 60s n w s 1 rgely inspire y u el 

n iesel’s is overy of lo lly-sensitive orient tion-sele tive neurons in the 
t’s visu 1 system 9 Lo 1 onne tions h ve een use m ny times in neur 1 
mo els of visu 1 le rning 7 13 16 23 ith lo 1 re eptive fiel s neurons n 
le rn to extr t element ry visu 1 fe tures su h s oriente e ges en -points 
orners (or simil r fe tures in other sign Is su h s spee h spe trogr ms) hese 
fe tures re then om ine y the su se uent 1 yers in or er to ete t higher- 
or er fe tures s st te e rlier istortions or shifts of the input n use the 
position of s lient fe tures to v ry n ition element ry fe ture ete tors th t 
re useful on one p rt of the im ge re likely to e useful ross the entire im ge 
his knowle ge n e pplie y for ing set of units whose re eptive fiel s 
re lo te t i erent pi es on the im ge to h ve i enti 1 weight ve tors 
2 

16 nits in 1 yer re org nize in pi nes within whi h 11 the units sh re 

the s me set of weights he set of outputs of the units in su h pi ne is lie 

feature map nits in fe ture m p re 11 onstr ine to perform the s me 
oper tion on i erent p rts of the im ge omplete onvolution 1 1 yer is om- 
pose of sever 1 fe ture m ps (with i erent weight ve tors) so th t multiple 

fe tures n e extr te t e h lo tion on rete ex mple of this is the first 

1 yer of LeNet-5 shown in igure 1 nits in the first hi en 1 yer of LeNet-5 re 
org nize in 6 pi nes e h of whi h is fe ture m p unit in fe ture m p 
h s 25 inputs onne te to 5 y 5 re in the input lie the reeeptive field 

of the unit h unit h s 25 inputs n therefore 25 tr in le oeffi ients plus 

tr in le i s he re eptive fiel s of ontiguous units in fe ture m p re 
entere on orrespon ingly ontiguous units in the previous 1 yer herefore 
re eptive fiel s of neigh oring units overl p or ex mple in the first hi en 
1 yer of LeNet-5 the re eptive fiel s of horizont lly ontiguous units overl p y 
4 olumns n 5 rows s st te e rlier 11 the units in fe ture m p sh re the 
s me set of 25 weights n the s me i s so they ete t the s me fe ture t 11 
possi le lo tions on the input he other fe ture m ps in the 1 yer use i erent 
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sets of weights n i ses there y extr ting i erent types of lo 1 fe tures 
n the se of LeNet-5 t e h input lo tion six i erent types of fe tures re 
extr te y six units in i enti 1 lo tions in the six fe ture ni ps se uenti 1 
implement tion of fe ture m p woul s n the input im ge with single unit 
th t h s lo 1 re eptive fiel n store the st tes of this unit t orrespon ing 
lo tions in the fe ture m p his oper tion is e uiv lent to onvolution fol- 
lowe y n itive i s n s u shing fun tion hen e the n me convolutional 
network he kernel of the onvolution is the set of onne tion weights use y 
the units in the fe ture m p n interesting property of onvolution 1 1 yers is 
th t if the input im ge is shifte the fe ture m p output will e shifte y the 
s me mount ut will e left un h nge otherwise his property is t the sis 
of the ro ustness of onvolution 1 networks to shifts n istortions of the input 

n e fe ture h s een ete te its ex t lo tion e omes less import nt 

nly its pproxim te position rel tive to other fe tures is relev nt sing h n - 

written igits s n ex mple on e we know th t the input im ge ont ins the 
en point of roughly horizont 1 segment in the upper left re orner in the 
upper right re n the en point of roughly verti 1 segment in the lower 
portion of the im ge we n tell the input im ge is 7 Not only is the pre ise 
position of e h of those fe tures irrelev nt for i entifying the p ttern it is po- 
tenti lly h rmful e use the positions re likely to v ry for i erent inst n es 
of the sh pe simple w y to re u e the pre ision with whi h the position of is- 
tin tive fe tures re en o e in fe ture m p is to re u e the sp ti 1 resolution 

of the fe ture m p his n e hieve with so- lie sub-sampling layers 

whi h performs lo 1 ver ging n su -s mpling re u ing the resolution 
of the fe ture m p n re u ing the sensitivity of the output to shifts n is- 
tortions he se on hi en 1 yer of LeNet-5 is su -s mpling 1 yer his 1 yer 
omprises six fe ture m ps one for e h fe ture m p in the previous 1 yer he 
re eptive fiel of e h unit is 2 y 2 re in the previous 1 yer’s orrespon ing 
fe ture m p h unit omputes the average of its four inputs multiplies it 
y tr in le oeffi ient s tr in le i s n p sses the result though 
sigmoi fun tion ontiguous units h ve non-overl pping ontiguous re eptive 
fiel s onse uently su -s mpling 1 yer fe ture m p h s h If the num er of 
rows n olumns s the fe ture m ps in the previous 1 yer he tr in le oef- 
fi ient n i s ontrol the e e t of the sigmoi non-line rity f the oeffi ient 
is sm 11 then the unit oper tes in u si-line r mo e n the su -s mpling 
1 yer merely lurs the input f the oeffi ient is 1 rge su -s mpling units n e 
seen s performing noisy ” or noisy N ” fun tion epen ing on the 
V lue of the i s Su essive 1 yers of onvolutions n su -s mpling re typi- 
lly Item te resulting in i-pyr mi ” t e hi yer the num er of fe ture 
m ps is in re se s the sp ti 1 resolution is e re se h unit in the thir 
hi en 1 yer in figure 1 m y h ve input onne tions from sever 1 fe ture m ps 
in the previous 1 yer he onvolution/su -s mpling om in tion inspire y 
u el n iesel’s notions of simple” n omplex” ells w s implement e 
in ukushim ’s Neo ognitron though no glo lly supervise le rning pro e- 
ure su h s k-prop g tion w s v il le then 1 rge egree of inv ri n e 
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to geometri tr nsform tions of the input n e hieve with this progres- 
sive re u tion of sp ti 1 resolution ompens te y progressive in re se of the 

ri hness of the represent tion (the num er of fe ture m ps) 

Sin e 11 the weights re le rne with k-prop g tion onvolution 1 net- 
works n e seen s synthesizing their own fe ture extr tors n tuning them 
to the t sk t h n he weight sh ring te hni ue h s the interesting si e e e t 

of re u ing the num er of free p r meters there y re u ing the p ity” of 

the m hine n re u ing the g p etween test error n tr ining error 16 
he network in figure 1 ont ins 345 30 onne tions ut only 60 000 tr in le 
free p r meters e use of the weight sh ring 

ixe -size onvolution 1 Networks h ve een pplie to m ny ppli tions 
mong others h n writing re ognition 121s well s m hine-printe h r- 
ter re ognition 32 on-line h n writing re ognition 1 n f e re ognition 
12 ixe -size onvolution 1 networks th t sh re weights long single tempo- 
r 1 imension re known s ime- el y Neur 1 Networks ( NNs) n pplie 
wi ely in spee h pro essing n time-series pre i tion ri le-size onvolu- 
tion 1 networks whi h h ve ppli tions in o je t ete tion n lo tion re 
es ri e in se tion 3 



2.2 LeNet-5 

his se tion es ri es in more et il the r hite ture of LeNet-5 the onvo- 
lution 1 Neur 1 Network use in the experiments LeNet-5 omprises 7 1 yers 
not ounting the output 11 of whi h ont in tr in le p r meters (weights) 
he input is 32x32 pixel im ge nput sh pes shoul e signifi ntly sm Her 
th n th t (e g on the or er of 20x20 pixels) he re son is th t it is esir le 

th t potent! 1 istin five fe tures su h s en -points or orner n ppe r in 

the center of the re eptive fiel of the highest-level fe ture ete tors n LeNet-5 
the set of enters of the re eptive fiel s of the 1 st onvolution 1 1 yer ( 3 see 
elow) form 20x20 re in the enter of the 32x32 input he v lues of the 
input pixels re norm lize so th t the kgroun level (white) orrespon s 
to V lue of -0 1 n the foregroun ( 1 k) orrespon s to 1 175 his m kes 
the me n input roughly 0 n the v ri n e roughly 1 whi h eler tes le rning 
20 n the following onvolution 1 1 yers re 1 ele x su -s mpling 1 yers 

re 1 ele Sx n fully- onne te 1 yers re 1 ele x where x is the 1 yer 

in ex 

L yer 1 is onvolution 1 1 yer with 6 fe ture m ps h unit in e h 
fe ture m p is onne te to 5x5 neigh orhoo in the input he size of the 

fe ture m ps is 2 x2 whi h prevents onne tion from the input from f lling o 

the oun ry 1 ont ins 156 tr in le p r meters n 122 304 onne tions 
L yer S2 is su -s mpling 1 yer with 6 fe ture m ps of size 14x14 h unit 
in e h fe ture m p is onne te to 2x2 neigh orhoo in the orrespon ing 
fe ture m p in 1 he four inputs to unit in S2 re e then multiplie y 
tr in le oeffi lent n e to tr in le i s he result is p sse through 

sigmoi 1 fun tion he 2x2 re eptive fiel s re non-overl pping therefore 
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fe ture m ps in S2 h ve h If the num er of rows n olumn s fe ture m ps in 

1 L yer S2 h s 12 tr in le p r meters n 5 0 onne tions 

L yer 3 is onvolution 1 1 yer with 16 fe ture m ps h unit in e h 
fe ture m p is onne te to sever 1 5x5 neigh orhoo s t i enti 1 lo tions in 

su set of S2’s fe ture m ps hy not onne t every S2 fe ture m p to every 

3 fe ture m p? he re son is twofol irst non- omplete onne tion s heme 
keeps the num er of onne tions within re son le oun s ore import ntly it 
for es re k of symmetry in the network i erent fe ture m ps re for e to 
extr t i erent (hopefully omplement ry) fe tures e use they get i erent 
sets of inputs he r tion le ehin the onne tion s heme is the following he 
first six 3 fe ture m ps t ke inputs from every ontiguous su sets of three 
fe ture m ps in S2 he next six t ke input from every ontiguous su set of 

four he next three t ke input from some is ontinuous su sets of four in lly 

the 1 st one t kes input from 11 S2 fe ture m ps he full onne tion t le is 
given in 19 L yer 3 h s 1 516 tr in le p r meters n 156 000 onne tions 

L yer S4 is su -s mpling 1 yer with 16 fe ture m ps of size 5x5 h unit 

in e h fe ture m p is onne te to 2x2 neigh orhoo in the orrespon ing 

fe ture m p in 3 in simil rwys lnS4L yer S4 h s 32 tr in le 

p r meters n 2 000 onne tions 

L yer 5 is onvolution 1 1 yer with 120 fe ture m ps h unit is on- 
ne te to 5x5 neigh orhoo on 11 16 of S4’s fe ture m ps ere e use the 
size of S4 is Iso 5x5 the size of 5’s fe ture m ps is 1x1 this mounts to 
full onne tion etween S4 n 5 5 is 1 ele s onvolution 1 1 yer in- 

ste of fully- onne te 1 yer e use if LeNet-5 input were m e igger with 

everything else kept onst nt the fe ture m p imension woul e 1 rger th n 
1x1 his pro ess of yn mi lly in re sing the size of onvolution 1 network 

is es ri e in the se tion Se tion 3 L yer 5 h s 4 120 tr in le onne tions 

L yer 6 ont ins 4 units (the re son for this num er omes from the esign 
of the output 1 yer expl ine 1 ter) n is fully onne te to 5 t h s 10 164 
tr in le p r meters 

s in 1 ssi 1 neur 1 networks units in 1 yers up to 6 ompute ot pro u t 
etween their input ve tor n their weight ve tor to whi h i s is e 
his weighte sum is then p sse through s le hyper oli t ngent fun tion 
to pro u e the st te of the unit 

in lly the output 1 yer is ompose of u li e n i 1 sis un tion 
units ( ) one for e h 1 ss with 4 inputs eh h output unit 

omputes the u li e n ist n e etween its input ve tor n its p r meter 
ve tor he output of p rti ul r n e interprete s pen Ity term 

me suring the fit etween the input p ttern n mo el of the 1 ss sso i te 
with the iven n input p ttern the loss fun tion shoul e esigne so 

s to get the onfigur tion of 6 s lose s possi le to the p r meter ve tor 
of the th t orrespon s to the p ttern’s esire 1 ss he p r meter ve - 
tors of these units were hosen y h n n kept fixe ( t le st initi lly) he 
omponents of those p r meters ve tors were set to -1 or -|-1 to pre etermine 
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V lues he p r meter ve tors of the s pi y the role of t rget ve tors for 
1 yer 6 

he simplest output loss fun tion th t n e use with the ove network 
is 

1 P 

E{W) -Y^yu.{Z^^W) ( 2 ) 

p=i 

where yop is the output of the -Dp-th unit i e the one th t orrespon s 
to the orre t 1 ss of input p ttern he tu 1 loss fun tion use in our 

experiments h s ition 1 term to m ke it more is rimin tive ore et ils 
re V il le in 19 omputing the gr ient of the loss fun tion with respe t 
to 11 the weights in 11 the 1 yers of the onvolution 1 network is one with 
k-prop g tion he st n r Igorithm must e slightly mo ifie to t ke 
ount of the weight sh ring n e sy w y to implement it is to first ompute 
the p rti 1 eriv tives of the loss fun tion with respe t to e h connection s 
if the network were onvention 1 multi-1 yer network without weight sh ring 
hen the p rti 1 eriv tives of 11 the onne tions th t sh re s me p r meter 
re e to form the eriv tive with respe t to th t p r meter 

2.3 An Example: Recognizing Handwritten Digits 

e ognizing in ivi u 1 igits is n ex client en hm rk for omp ring sh pe 
re ognition metho s his omp r tive stu y on entr tes on ptive metho s 
th t oper te ire tly on size-norm lize im ges n written igit re ognition 
m y seem little simplisti when one’s interest is omputer ision ut the 

simpli ity is only pp rent n the pro lems to solve re essenti lly the s me 

s with ny 2 sh pe re ognition only there is un nt tr ining t v il le 
n the intr - 1 ss sh pe v ri ility is onsi er ly 1 rger th n with ny rigi 
o je t re ognition pro lem 

he t se use to tr in n test the systems es ri e in this p per w s 

onstru te from the N S ’s Spe i 1 t se 3 n 1 ont ining in ry im ges 

of h n written igits rom these we uilt t se lie N S whi h on- 

t ins 60 000 tr ining s mples (h If from S 1 h If from S3) n 10 000 test 

im ges (h If from S 1 n h If from S 3) he origin 1 1 k n white ( ilevel) 

im ges were size norm lize to fit in 20x20 pixel ox while preserving their 

spe t r tio he resulting im ges ont in grey levels s result of nti- li se 

res mpling hree versions of the t se were use n the first version the 

im ges were entere in 2 x2 im ge y omputing the enter of m ss of the 
pixels n tr nsl ting the im ge so s to position this point t the enter of 
the 2 x2 fiel n some inst n es this 2 x2 fiel w s exten e to 32x32 with 
kgroun pixels his version of the t se will e referre to s the regular 
t se n the se on version of the t se (referre to s the deslanted 
version) the h r ter im ges were esl nte using the moments of inerti of the 

1 k pixels n roppe own to 20x20 pixels im ges n the thir version of the 

t se use in some e rly experiments the im ges were re u e to 16x16 pix- 
els he regul r t se is v il le t http: //www. research, att.com/yann 
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2.4 Results and Comparison with Other Classifiers 

Sever 1 versions of LeNet-5 were tr ine on the regul r t se with typi lly 
20 iter tions through the entire tr ining t he test error r te st ilizes f- 
ter roun 10 p sses through the tr ining set t 0 95% he error r te on the 
tr ining set re hes 0 35% fter 19 p sses he influen e of the tr ining set size 
w s me sure y tr ining the network with 15 000 30 000 n 60 000 ex m- 
ples he results m e it le r th t ition 1 tr ining t woul e enefl i 1 
n nother set of experiments we rtili i lly gener te more tr ining ex mples 
y r n omly istorting the origin 1 tr ining im ges he in re se tr ining set 
w s ompose of the 60 000 origin 1 p tterns plus 540 000 inst n es of istorte 
p tterns with r n omly pi ke istortion p r meters he istortions were om- 
in tions of the following pi n r fline tr nsform tions horizont 1 n verti 1 
tr nsl tions s ling s ueezing (simult neous horizont 1 ompression n verti- 
1 elong tion or the reverse) n horizont 1 she ring igure 2 shows ex mples 
of istorte p tterns use for tr ining hen istorte t w s use for tr in- 
ing the test error r te roppe to 0 % (from 0 95% without eform tion) Some 
of the mis 1 ssifie ex mples re genuinely m iguous ut sever 1 re perfe tly 
i entifi le y hum ns Ithough they re written in n un er-represente style 
his shows th t further improvements re to e expe te with more tr ining 
t 

or the s ke of omp rison v riety of other tr in le 1 ssiflers w s tr ine 
n teste on the s me t se he error r tes on the test set for the v r- 
ious metho s re shown in figure 3 he experiments in lu e the following 
metho s linear classification with 10 two-w y 1 ssiflers tr ine to 1 ssify one 
1 ss from the other nine; pairwise linear elassifier with 45 two-w y 1 ssiflers 
tr ine to 1 ssify one 1 ss versus one other followe y voting me h nism; 
K-Nearest Neighbor classifiers with simple u li e n ist n e on pixel im ges; 
40 -dimension prineipal component analysis followed by degree 2 polynomial clas- 
sifier, radial basis funetion network with 1000 ussi n tr ine with K- 
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Linear 
[deslant] Linear 
Pairwise 



K-NN Euclidean 
[deslant] K-NN Euclidean 
40 PCA + quadratic 
1000 RBF + linear 
[16x16] Tangent Distance 
SVM poly 4 
RS-SVM poly 5 
[dist]V-SVMpoly9 

28x26-300-10 
[dist] 28x28-300-10 
(deslant) 20x20-300-10 
28x28-1000-10 
[dist] 28x28-1000-10 
28x28-300-100-10 
[dist] 28x28-300-100-10 
28x28-500-150-10 
(dist) 28x28-500-150-10 

[16x16] LeNet-1 
LeNet-4 
LeNel-4 / Local 
LeNet-4 1 K-NN 
LeNet-5 
[dist] LeNet-5 
[dist] Boosted LeNet-4 




Fig. 3. rror r te on the test set (%) for v rious 1 ssifi tion metho s esl nt 
in i tes th t the 1 ssifier w s tr ine n teste on the esl nte version of 
the t se ist in i tes th t the tr ining set w s ugmente with rtifi- 

i lly istorte ex mples 16x16 in i tes th t the system use the 16x16 pixel 
im ges he un ert inty in the note error r tes is out 0 1% 



me ns per 1 ss n followe y line r 1 ssifier; Tangent Distance classifier 
ne rest-neigh or 1 ssifier where the ist n e is m e inv ri nt to sm 11 geo- 
metri istortions y proje ting the p ttern onto line r pproxim tions of the 
m nifol s gener te y istorting the prototypes; Support Vector Machines of 
V rious types (regul r S re u e -set S virtu 1 S ) using polynomi 1 
kernels; fully connected neural nets with one or two hi en 1 yers n v rious 
num ers of hi en units; LeNet-1 sm 11 onvolution 1 neur 1 net with only 
2600 free p r meters n 100 000 onne tions; LeNet-f onvolution 1 neur 1 
net with 17 000 free p r meters n 260 000 onne tion simil r to ut slightly 
i erent from LeNet-5; Boosted LeNetf 1 ssifier o t ine y voting three 
inst n es of LeNet-4 tr ine on i erent su sets of the t se; n fin lly 
LeNet-5 

on erning fully- onne te neur 1 networks it rem ins somewh t of mys- 
tery th t unstru ture neur 1 nets with su h 1 rge num er of free p r meters 
m n ge to hieve re son le perform n e e onje ture th t the yn mi s 
of gr ient es ent le rning in multil yer nets h s self-regul riz tion” e e t 
e use the origin of weight sp e is s le point th t is ttr tive in 1- 
most every ire tion the weights inv ri ly shrink uring the first few epo hs 
Sm 11 weights use the sigmoi s to oper te in the u si-line r region m king 
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the network essenti lly e uiv lent to low- p ity single-1 yer network s 
the le rning pro ee s the weights grow whi h progressively in re ses the e e - 
tive p ity of the network his seems to e n Imost perfe t if fortuitous 
implement tion of pnik’s Stru tur 1 isk inimiz tion” prin iple 31 

he Support e tor hine 31 h s ex ellent ur y whi h is most re- 
m rk le e use unlike the other high perform n e 1 ssifiers it oes not in- 
lu e a priori knowle ge out the pro lem 4 n f t this 1 ssifier woul o 
just s well if the im ge pixels were permute with fixe m pping n lost 
their pi tori 1 stru ture owever re hing levels of perform n e omp r le to 
the onvolution 1 Neur 1 Networks n only e one t onsi er le expense in 
memory n omput tion 1 re uirements he omput tion 1 re uirements of 
urges’s re u e -set S re within f tor of two of LeNet-5 n the error 
r te is very lose mprovements of those results re expe te s the te hni ue 
is rel tively new 

ooste LeNet-4 performe est hieving s ore of 0 7% losely followe 
y LeNet-5 t 0 % ooste LeNet-4 6 is se on theoreti 1 work y 
S h pire 29 hree LeNet-4s re om ine the first one is tr ine the usu 1 

w y the se on one is tr ine on p tterns th t re filtere y the first net so 

th t the se on m hine sees mix of p tterns 50% of whi h the first net got 
right n 50% of whi h it got wrong in lly the thir net is tr ine on new 

p tterns on whi h the first n the se on nets is gree uring testing the 

outputs of the three nets re simply e 

hen plenty of t is v il le m ny met ho s n tt in respe t le - 
ur y omp re to other metho s onvolution 1 neur 1 nets o er not only 
the est ur y ut Iso goo spee low memory re uirements n ex ellent 
ro ustness s is usse elow 

2.5 Invariance and Noise Resistance 

hile fully inv ri nt re ognition of omplex sh pes is still n elusive go 1 it 
seems th t onvolution 1 networks e use of their r hite ture o er p rti 1 
nswer to the pro lem of inv ri n e or ro ustness with respe t to istortions 

V rying position s le n orient tion s well s intrinsi 1 ss v ri ility ig- 
ure 4 shows sever 1 ex mples of unusu 1 n istorte h r ters th t re or- 
re tly re ognize y LeNet-5 or these experiments the tr ining s mples were 

rtifi i lly istorte using r n om pi n r ffine tr nsform tions n the pixels 
in the tr ining im ges were r n omly flippe with pro ility 0 1 to in re se 
the noise resist n e he top row in the figure shows the ro ustness to size n 
orient tion v ri tions t is estim te th t ur te re ognition o urs for s le 

V ri tions up to out f tor of 2 verti 1 shift v ri tions of plus or minus 
out h If the height of the h r ter n rot tions up to plus or minus 30 

egrees hile the h r ters re istorte uring tr ining it seems th t the ro- 
ustness of the network su sists for istortions th t re signifi ntly 1 rger th n 
the ones use uring tr ining igure 4 in lu es ex mples of h r ters written 
in very unusu 1 styles Nee less to s y there re no su h ex mples in the tr ining 
set Nevertheless the network 1 ssifies them orre tly whi h seems to suggest 
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Fig. 4. X mples of unusu 1 istorte n noisy h r ters orre tly re ognize 
y LeNet-5 he grey-level of the output 1 el represents the pen Ity (lighter for 
higher pen Ities) 



th t the fe tures th t h ve een le rne h ve some egree of gener lity L stly 
figure 4 in lu es ex mples th t emonstr tes LeNet-5’s ro ustness to extremely 
high levels of stru ture noise n ling these im ges with tr ition 1 segmen- 

t tion n fe ture extr tion te hni ues woul pose insurmount le pro lems 
ven though the only noise use uring tr ining w s r n om pixel flipping it 
seems th t the network n elimin te the verse e e ts of non-sensi 1 ut 

stru ture m rks from im ges su h s the 3 n the in the se on row his 
emonstr tes somewh t puzzling ility of su h networks to perform (if im- 

pli itly) kin of element ry feature binding solely through fee -forw r line r 
om in tions n sigmoi fun tions 

nim te ex mples of LeNet-5 in tion re v il le on the nternet t 
http : //www . research. att . com/~yann 
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3 Multiple Object Recognition with Space Displacement 
Neural Networks 

m jor on eptu 1 pro lem in vision n p ttern re ognition is how to re ognize 
in ivi u 1 o je ts when those o je ts nnot e e sily segmente out of their 
surroun ing n gener 1 this poses the pro lem of fe ture in ing how to i entity 
n in together fe tures th t elong to single o je t while suppressing 
fe tures th t elong to the kgroun or to other o je ts he ommon wis om 
is th t ex ept in the simplest se one nnot i entity n in together the 
fe tures of n o je t unless one knows wh to je t to look for 

n h n writing re ognition the pro lem is to sep r te hr ter from its 
neigh ors given th t the neigh ors n tou h it or overl p with it he most 
ommon solution is lie heuristi over-segment tion” t onsists in gener t- 
ing 1 rge num er of potenti 1 uts etween h r ters using heuristi im ge 
n lysis te hni ues n i te h r ters re forme y om ining ontiguous 
segments in multiple w ys he n i te h r ters re then sent to the re og- 
nizer for 1 ssifi tion n s oring simple gr ph-se r h te hni ue then fin s 
the onsistent se uen e of h r ter n i tes with the est over 11 s ore 

here is simple Item tive to expli itly segmenting im ges of h r ter 
strings using heuristi s he i e is to sweep re ognizer ross 11 possi le 
lo tions on n im ge of the entire wor or string whose height h s een nor- 
m lize ith this te hni ue no segment tion heuristi s is re uire owever 
there re pro lems with this ppro h irst the metho is in gener 1 uite 
expensive he re ognizer must e pplie t every possi le lo tion on the in- 
put or t le st t 1 rge enough su set of lo tions so th t mis lignments of 
h r ters in the fiel of view of the re ognizers re sm 11 enough to h ve no 
e e t on the error r te Se on when the re ognizer is entere on hr ter 
to e re ognize the neigh ors of the enter h r ter will e present in the 
fiel of view of the re ognizer possi ly tou hing the enter h r ter herefore 
the re ognizer must e le to orre tly re ognize the h r ter in the enter 
of its input fiel even if neigh oring h r ters re very lose to or tou hing 
the entr 1 h r ter hir wor or h r ter string nnot e perfe tly size 
norm lize n ivi u 1 hr ters within string m y h ve wi ely v rying sizes 
n seline positions herefore the re ognizer must e very ro ust to shifts 
n size v ri tions 

hese three pro lems re eleg ntly ir umvente if onvolution 1 network 
is repli te over the input fiel irst of 11 s shown in the previous se tion 
onvolution 1 neur 1 networks re very ro ust to shifts n s le v ri tions of 
the input im ge s well s to noise n extr neous m rks in the input hese 
properties t ke re of the 1 tter two pro lems mentione in the previous p r - 
gr ph Se on onvolution 1 networks provi e r sti s ving in omput tion 1 
re uirement when repli te over 1 rge input fiel s repli te onvolution 1 
network Iso lie Space Displacement Neural Network or S NN 22 is 
shown in igure 5 hile s nning re ognizer n e prohi itively expensive 
in gener 1 onvolution 1 networks n e s nne or repli te very efh iently 
over 1 rge v ri le-size input fiel s onsi er one inst n e of onvolution 1 
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Fig. 5. Sp e ispl ement Neur 1 Network is onvolution 1 network th t 
h s een repli te over wi e input fiel 



net n its alter ego t ne r y lo tion e use of the onvolution 1 n ture 
of the network units in the two inst n es th t look t i enti 1 lo tions on the 
input h ve i enti 1 outputs therefore their st tes o not nee to e ompute 
twi e nly thin sli e” of new st tes th t re not sh re y the two network 
inst n es nee s to e re ompute hen 11 the sli es re put together the re- 
sult is simply 1 rger onvolution 1 network whose stru ture is i enti 1 to the 

origin 1 network ex ept th t the fe ture m ps re 1 rger in the horizont 1 i- 

mension n other wor s repli ting onvolution 1 network n e one simply 
y in re sing the size of the fiel s over whi h the onvolutions re performe 
n y repli ting the output 1 yer or ingly he output 1 yer e e tively 
e omes onvolution 1 1 yer n output whose re eptive fiel is entere on 
n element ry o je t will pro u e the 1 ss of this o je t while n in- etween 

output m y in i te no h r ter or ont in ru ish he outputs n e inter- 

prete s evi en es for the presen e of o je ts t 11 possi le positions in the 
input fiel 

he S NN r hite ture seems p rti ul rly ttr tive for re ognizing ursive 
h n writing where no o vious segment tion heuristi s exist Ithough the i e 
of S NN is uite ol 10 22 n very ttr tive y its simpli ity it h s not 
gener te wi e interest until re ently e use of the enormous em n s it puts 
on the re ognizer 
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Fig. 6. n ex mple of multiple h r ter re ognition with S NN ith S NN 
no expli it segment tion is performe 



3.1 Interpreting the Output of an SDNN 

he output of horizont lly repli te S NN is se uen e of ve tors whi h 
en o e the likelihoo s pen Ities or s ores of fin ing h r ter of p rti ul r 

1 ss 1 el t the orrespon ing lo tion in the input post-pro essor is re uire 

to pull out the est possi le 1 el se uen e from this ve tor se uen e n ex mple 
of S NN output is shown in igure 6 ery often in ivi u 1 hr ters re 
spotte y sever 1 neigh oring inst n es of the re ognizer onse uen e of 
the ro ustness of the re ognizer to horizont 1 tr nsl tions Iso uite often 
h r ters re erroneously ete te y re ognizer inst n es th t see only pie e 
of hr ter or ex mple re ognizer inst n e th t only sees the right thir of 
4” might output the 1 el 1 ow n we elimin te those extr neous h r ters 
from the output se uen e n pull-out the est interpret tion? his n e one 
with simple weighte finite st te m hine he se uen e of ve tors pro u e y 

the S NN is first turne into line r gr ph onstru te s follows h ve tor in 

the output se uen e is tr nsforme into un le of r s with ommon sour e 
no e n t rget no e hr ont ins one of the possi le h r ter 1 els 
together with its orrespon ing pen Ity h un le ont ins n it ion 1 

r e ring the none of the ove”l el with pen Ity hese un les re 
on ten te in the or er of the ve tor se uen e (the t rget no e of un le 
e omes the sour e no e of the next un le) h p th in this gr ph is possi le 
interpret tion of the input gr mm r is then onstru te s weighte finite- 
st te m hine th t ont ins mo el for e h h r ter he gr mm r ensures 
for ex mple th t neigh oring h r ters must e sep r te y none of the 
ove” 1 el (white sp e) n th t su essive o urren es of the s me 1 el 
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Fig. 7. n S NN pplie to noisy im ge of igit string he igits shown in 
the S NN output represent the winning 1 ss 1 els with lighter grey level for 
high-pen Ity nswers 



re pro ly pro u e y single input hr ter he gr mm r n the line r 

gr ph re then composed ( gr ph oper tion simil r to tensor pro u t) he 

ompose gr ph ont ins 11 the p ths of the line r gr ph th t h ppen to e 

gr mm ti lly orre t iter i Igorithm n then e use to fin the p th 
with the sm llest over 11 pen Ity 

3.2 Experiments with SDNN 

n series of experiments LeNet-5 w s tr ine with the go 1 of eing repli te 
into n S NN so s to re ognize multiple h r ters without segment tions he 

t w s gener te from the previously es ri e NS set s follows r ining 
im ges were ompose of entr 1 h r ter fl nke y two si e hr ters 

pi ke t r n om in the tr ining set he sep r tion etween the oun ing 
oxes of the h r ters were hosen t r n om etween -1 n 4 pixels n other 
inst n es no entr 1 h r ter w s present in whi h se the esire output 

of the network w s the 1 nk sp e 1 ss n ition tr ining im ges were 

egr e y r n omly flipping the pixels with pro ility 0 1 

igures 6 n 7 show few ex mples of su essful re ognitions of multi- 
ple h r ters y the LeNet-5 S NN St n r te hni ues se on euristi 

ver-Segment tion woul likely f il on most of those ex mples he ro ustness 
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of the network to s le n verti 1 position v ri tions Hows it to re ognize 
h r ters in su h strings ore import ntly it seems th t the network is le 
to in ivi u lly re ognize the h r ters even when there is signifi nt overl p 
with the neigh ors t is Iso le to orre tly group is onne te pie es of ink 
th t form h r ters s exemplifie in the upper h If of the figure n the top 
left ex mple the 4 n the 0 re more onne te to e h other th n they re 
onne te with themselves yet the system orre tly i entities the 4 n the 0 s 
sep r te o je ts he top right ex mple is interesting for sever 1 re sons irst 
the system orre tly i entities the three in ivi u 1 1” Se on the left h If n 
right h If of the is onne te 4 re orre tly groupe even though no simple 
proximity riterion oul e i e to sso i te the left h If of the 4 to the verti 1 
r on its left or on its right he right h If of the 4 oes use the ppe r n e of 

n erroneous 1” on the S NN output ut this 1” is remove y the gr mm r 

whi h prevents i erent non- 1 nk h r ters from ppe ring on ontiguous out- 
puts he ottom left ex mple emonstr tes th t extr neous m rks th t o not 
elong to i entifi le h r ters re suppresse even though they m y onne t 
genuine h r ters to e h other he lower right ex mple shows the om ine 
ro ustness to h r ter overl ps verti 1 shifts size v ri tions n noise 

Sever 1 uthors h ve rgue th t inv ri n e n fe ture in ing for multi- 
ple o je t re ognition re uires spe ifi me h nisms involving fee k expli it 
swit hing evi es (3-w y multipli tive onne tions) 11 o je t- entere repre- 
sent tions gr ph m t hing me h nisms or gener tive mo els th t ttempt to 
simult neously extr t the pose n the tegory of the o je ts t is somewh t 

is on erting to o serve th t the ove S NN seems to solve” the fe ture in - 

ing pro lem 1 eit p rti lly n in restri te ontext even though it possesses 
no uilt in m hinery to o it expli itly f nothing else these experiments show 
th t purely fee -forw r numeri 1” multi-1 yer systems with fixe r hi- 
te ture n emul te fun tions th t ppe r om in tori In re u lit tively 
mu h more omplex th n nti ip te y most (in lu ing the uthors) 

Sever 1 short nim tions of the LeNet-5 S NN in lu ing some with h r - 
ters th t move on top of e h other n e viewe t 
http : / /www. research. att . com/ yann 

3.3 Face Detection and Spotting with SDNN 

n interesting ppli tion of S NNs is o je t ete tion n spotting he in- 
V ri n e properties of onvolution 1 Networks om ine with the efR ien y with 
whi h they n e repli te over 1 rge fiel s suggest th t they n e use for 

rute for e” o je t spotting n ete tion in 1 rge im ges he m in i e is 

to tr in single onvolution 1 Network to istinguish im ges of the o je t of 
interest from im ges present in the kgroun n e tr ine the network is 
repli te so s to over the entire im ge to e n lyze there y forming 

two- imension 1 Sp e ispl ement Neur 1 Network he output of the S NN 

is two- imension 1 pi ne in whi h the most tiv te units in i te the pres- 
en e of the o je t of interest in the orrespon ing re eptive fiel Sin e the size 
of the o je ts to e ete te within the im ge re unknown the im ge n e 
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presente to the network t multiple resolutions n the results t multiple 
resolutions om ine he i e h s een pplie to f e lo tion 30 ress 

lo k lo tion on envelopes 33 n h n tr king in vi eo 24 

o illustr te the metho we will onsi er the se of f e ete tion in im- 
ges s es ri e in 30 irst im ges ont ining f es t v rious s les re 
olle te hose im ges re liltere through zero-me n L pi in filter so s 
to remove v ri tions in glo 1 illumin tion n 1 rge-s le illumin tion gr i- 
ents hen tr ining s mples of f es n non-f es re m nu lly extr te from 
these im ges he f e su -im ges re then size norm lize so th t the height 
of the entire f e is pproxim tely 20 pixels while keeping f irly 1 rge v ri tions 

(within f tor of two) he s le of kgroun su -im ges re pi ke t r n- 

om single onvolution 1 network is tr ine on those s mples to 1 ssify f e 
su -im ges from non-f e su -im ges hen s ene im ge is to e n lyze it 

is first filtere through the L pi in filter n su -s mple y r tios th t re 

su essive powers of the s u re root of 2 he network is repli te over e h of 
the im ges t e h resolution simple voting te hni ue is use to om ine the 
results from multiple resolutions 

ore re ently some uthors h ve use Neur 1 Networks or other 1 ssifiers 
su h s Support e tor hines for f e ete tion with gre t su ess 27 25 
heir systems re somewh t simil r to the one es ri e ove in lu ing the 
i e of presenting the im ge to the network t multiple s les ut sin e those 
systems o not use onvolution 1 Networks they nnot t ke v nt ge of the 
spee up es ri e here n h ve to rely on other te hni ues su h s pre-filtering 
n re 1-time tr king to keep the omput tion 1 re uirement within re son le 
limits n ition e use those 1 ssifiers re mu h less inv ri nt to s le 

V ri tions th n onvolution 1 Networks it is ne ess ry to use 1 rge num er 

multis le im ges with finely-sp e s les 

4 Graph Transformer Networks 

espite the pp rent ility of the systems es ri e in the previous se tions 

to solve om in tori 1 pro lems with non- om in tori 1 me ns there re sit- 

u tions where the nee for compositionality n om in tori 1 se r hes is in- 
es p le goo ex mple is 1 ngu ge mo eling n more gener lly mo els t h t 
involve finite-st te gr mm rs weighte finite-st te m hines or other gr ph- 
se knowle ge represent tions su h s finite-st te tr ns u ers he m in point 

of this se tion is to show th t gr ient- se le rning te hni ues n e exten e 

to situ tions where those mo els re use 

t is e sy to show th t the mo ul r gr ient- se le rning mo el presente 
in se tion Ine pplie to networks of mo ules whose st te v ri les Xn re 
gr phs with numeri 1 inform tion tt he to the r s (s 1 rs ve tors et ) 
r ther th n fixe -size ve tors here re two m in on itions for this irst 
the mo ules must pro u e the v lues on the output gr phs from the v lues 
on the input gr phs through i erenti le fun tions Se on the over 11 loss 

fun tion shoul e ontinuous n i erenti le almost everywhere with respe t 
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to the p r meters Networks of gr ph-m nipul ting mo ules re lie Graph 
Transformer Networks 3 19 

4.1 Word Recognition with a Graph Transformer Network 

hough the Sp e ispl ement Neur 1 Net metho presente in the previ- 
ous se tion is very promising for wor re ognition ppli tions the more tr - 

ition 1 metho ( n so f r still the most evelope ) is lie heuristi over- 
segment tion ith this metho s the wor is segmente into n i te h r- 

ters using heuristi im ge n lysis te hni ues nfortun tely it is Imost im- 
possi le to evise te hni ues th t will inf Hi ly segment n tur lly written wor s 
into well forme h r ters his se tion n the next es ri e in et il sim- 
ple ex mple of N for re ing wor s he metho n rely on gr lent- se 
le rning to voi s the expensive n unreli le t sk of m nu lly segmenting n 
h n -truthing t se so s to tr in the re ognizer on in ivi u 1 hr ters 

Segmentation. iven wor num er of n i te uts re gener te with 

heuristi metho s he ut gener tion heuristi is esigne so s to gener te 
more uts th n ne ess ry in the hope th t the orre t” set of uts will e 
in lu e n e the uts h ve een gener te Item tive segment tions re est 

represente y gr ph lie the segmentation graph he segment tion gr ph 
is Directed Acyclic Graph ( ) with st rt no e n n en no e h 

intern 1 no e is sso i te with n i te ut pro u e y the segment tion 
Igorithm h r etween sour e no e n estin tion no e is sso i te 
with n im ge th t ont ins 11 the ink etween the ut sso i te with the 

sour e no e n the ut sso i te with the estin tion no e n r is re te 

etween two no es if the segmenter e i e th t the pie e(s) of ink etween the 
orrespon ing uts oul form n i te h r ter ypi lly e h in ivi u 1 

pie e of ink woul e sso i te with n r irs of su essive pie es of ink 

woul Iso e in lu e unless they re sep r te y wi e g p whi h is le r 
in i tion th t they elong to i erent h r ters h omplete p th through 
the gr ph ont ins e h pie e of ink on e n only on e h p th orrespon s 
to i erent w y of sso i ting pie es of ink together so s to form h r ters 

Recognition Transformer and Viterbi Transformer. simple N to 
re ognize h r ter strings is shown in igure nly the right r n h of the top 
h If is use for re ognition he left r n h is use for the tr ining pro e ure 
es ri e in the next su -se tion he N is ompose of two m in gr ph 
tr nsformers lie the recognition transformer Tree n the Viterbi transformer 
Tvit he go 1 of the re ognition tr nsformer is to gener te gr ph lie the 
interpretation graph or recognition graph Gmt th t ont ins 11 the possi le 
interpret tions for 11 the possi le segment tions of the input h p th in 
Gint represents one possi le interpret tion of one p rti ul r segment tion of the 
input he role of the iter i tr nsformer is to extr t the est interpret tion 
from the interpret tion gr ph 
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Fig. 8. N r hite ture for wor re ognition se on euristi ver- 

Segment tion uring re ognition only the right-h n p th of the top p rt is 
use or tr ining with iter i tr ining only the left-h n p th is use or is- 
rimin tive iter i tr ining oth p ths re use u ntities in s u re r kets 
re pen Ities ompute uring the forw r prop g tion u ntities in p renthe- 
ses re p rti 1 eriv tives ompute uring the kw r prop g tion 
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he re ognition tr nsformer Tree t kes the segment tion gr ph Ggeg s in- 
put n pplies the re ognizer for single h r ters to the im ges sso i te 
with e h of the r s in the segment tion gr ph he interpret tion gr ph Gint 
h s Imost the s me stru ture s the segment tion gr ph ex ept th t e h r 
is repl e y set of r s from n to the s me no e n this set of r s there 
is one r for e h possi le 1 ss for the im ge sso i te with the orrespon - 
ing r in Ggeg o e h r is tt he 1 ss 1 el n the pen Ity th t the 
im ge elongs to this 1 ss s pro u e y the re ognizer f the segmenter h s 
ompute pen Ities for the n i te segments these pen Ities re om ine 
with the pen Ities ompute y the h r ter re ognizer to o t in the pen 1- 
ties on the r s of the interpret tion gr ph Ithough om ining pen Ities of 
i erent n ture seems highly heuristi the N tr ining pro e ure will tune 
the pen Ities n t ke v nt ge of this om in tion nyw y h p th in the 
interpret tion gr ph orrespon s to possi le interpret tion of the input wor 
he pen Ity of p rti ul r interpret tion for p rti ul r segment tion is given 
y the sum of the r pen Ities long the orrespon ing p th in the interpre- 
t tion gr ph omputing the pen Ity of n interpret tion in epen ently of the 
segment tion re uires to om ine the pen Ities of 11 the p ths with th t in- 
terpret tion his n e one using the forw r ” Igorithm wi ely use in 
i en rkov o els 

he iter i tr nsformer pro u es gr ph Gyit with single p th his p th 
is the p th of le st umul te pen Ity in the interpret tion gr ph he result of 
the re ognition n e pro u e y re ing o the 1 els of the r s long the 
gr ph Grit extr te y the iter i tr nsformer he iter i tr nsformer owes 
its n me to the f mous Viterbi algorithm to fin the shortest p th in gr ph 

4.2 Gradient-Based Training of a GTN 

he previous se tion es ri es the pro ess of re ognizing string using euristi 
ver-Segment tion ssuming th t the re ognizer is tr ine so s to ssign low 
pen Ities to the orre t 1 ss 1 el of orre tly segmente h r ters high pen 1- 
ties to erroneous tegories of orre tly segmente h r ters n high pen Ities 
to 11 tegories for ly forme h r ters his se tion expl ins how to tr in 
the system t the string level to o the ove without re uiring m nu 1 1 eling 
of h r ter segments 

n m ny ppli tions there is enough priori knowle ge out wh t is ex- 
pe te from e h of the mo ules in or er to tr in them sep r tely or ex mple 
with euristi ver-Segment tion one oul in ivi u lly 1 el single- h r ter 
im ges n tr in hr ter re ognizer on them ut it might e iffi ult to 
o t in n ppropri te set of non- h r ter im ges to tr in the mo el to re- 
je t wrongly segmente n i tes Ithough sep r te tr ining is simple it re- 
uires ition 1 supervision inform tion th t is often 1 king or in omplete (the 
orre t segment tion n the 1 els of in orre t n i te segments) he fol- 
lowing se tion es ri es two of the m ny gr ient- se metho s for tr ining 
N- se h n writing re ognizers t the string level iter i tr ining n is- 
rimin tive iter i tr ining nlike simil r ppro hes in the ontext of spec h 
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re ognition we m ke no re ourse to pro ilisti interpret tion ut show 
th t within the r ient- se Le rning ppro h is rimin tive tr ining is 
simple inst n e of the perv sive prin iple of error orre ting le rning 



Viterbi Training, uring re ognition we sele t the p th in the nterpret tion 
r ph th t h s the lowest pen Ity with the iter i Igorithm e lly we woul 
like this p th of lowest pen Ity to e sso i te with the orre t 1 el se uen e 
s often s possi le no vious loss fun tion to minimize is therefore the ver ge 
over the tr ining set of the pen Ity of the p th associated with the correet label 
sequenee th t h s the lowest pen Ity he go 1 of tr ining will e to fin the 
set of re ognizer p r meters (the weights if the re ognizer is neur 1 network) 
th t minimize the ver ge pen Ity of this orre t” lowest pen Ity p th he 
gr ient of this loss fun tion n e ompute y k-prop g tion through the 
N r hite ture shown in figure using only the left-h n p th of the top 
p rt n ignoring the right h If his tr ining r hite ture ont ins gr ph 
tr nsformer lie path seleetor inserte etween the nterpret tion r ph 
n the iter i r nsformer his tr nsformer t kes the interpret tion gr ph 

n the esire 1 el se uen e s input t extr ts from the interpret tion gr ph 

those p ths th t ont in the orre t ( esire ) 1 el se uen e ts output gr ph 

Gc is lie the eonstrained interpretation graph n ont ins 11 the p ths th t 

orrespon to the orre t 1 el se uen e he onstr ine interpret tion gr ph 
is then sent to the iter i tr nsformer whi h pro u es gr ph Gcvit with 
single p th his p th is the orre t” p th with the lowest pen Ity in lly 
p th s orer tr nsformer t kes Gcvit n simply omputes its umul te pen Ity 
Gcvit y ing up the pen Ities long the p th he output of this N is the 
loss fun tion for the urrent p ttern 



Evit 



G 



cvit 



( 3 ) 



he only 1 el inform tion th t is re uire y the ove system is the se uen e 

of esire h r ter 1 els No knowle ge of the orre t segment tion is re uire 

on the p rt of the supervisor sin e the system hooses mong the segment tions 
in the interpret tion gr ph the one th t yiel s the lowest pen Ity 

he pro ess of k-prop g ting gr ients through the iter i tr ining N 

is now es ri e s expl ine in se tion 1 the gr ients must e prop g te 
kw r s through 11 mo ules of the N in or er to ompute gr ients in 
pre e ing mo ules n there fter tune their p r meters k-prop g ting gr - 
ients through the p th s orer is uite str ightforw r he p rti 1 eriv tives 
of the loss fun tion with respe t to the in ivi u 1 pen Ities on the onstr ine 
iter i p th Gcvit re e u 1 to 1 sin e the loss fun tion is simply the sum of those 
pen Ities k-prop g ting through the iter i r nsformer is e u lly simple 
he p rti 1 eriv tives of E'vit with respe t to the pen Ities on the r s of the 
onstr ine gr ph Gc re 1 for those r s th t ppe r in the onstr ine iter i 

p th Gcvit n 0 for those th t o not hy is it legitim te to k-prop g te 

through n essenti lly is rete fun tion su h s the iter i r nsformer? he 

nswer is th t the iter i r nsformer is nothing more th n olle tion of 
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min fun tions n ers put together t n e shown e sily th t gr ients 
n e k-prop g te through min fun tions without verse e e ts k- 
prop g tion through the p th sele tor tr nsformer is simil r to k-prop g tion 
through the iter i tr nsformer r s in Gi„t th t ppe r in Gc h ve the s me 
gr ient s the orrespon ing r in Gc i e 1 or 0 epen ing on whether the 
r ppe r in Gcvit he other r s i e those th t o not h ve n alter ego in 

Gc e use they o not ont in the right 1 el h ve gr ient of 0 uring 

the forw r prop g tion through the re ognition tr nsformer one inst n e of 
the re ognizer for single h r ter w s re te for e h r in the segment tion 
gr ph he st te of re ognizer inst n es w s store Sin e e h r pen Ity in 
Gint is pro u e y n in ivi u 1 output of re ognizer inst n e we now h ve 
gr ient (1 or 0) for e h output of e h inst n e of the re ognizer e ognizer 
outputs th t h ve non zero gr ient re p rt of the orre t nswer n will 
therefore h ve their v lue pushe own he gr ients present on the re ognizer 
outputs n e k-prop g te through e h re ognizer inst n e or e h re - 

ognizer inst n e we o t in ve tor of p rti 1 eriv fives of the loss fun tion 

with respe t to the re ognizer inst n e p r meters 11 the re ognizer inst n es 
sh re the s me p r meter ve tor sin e they re merely lones of e h other 

therefore the full gr ient of the loss fun tion with respe t to the re ognizer’s 

p r meter ve tor is simply the sum of the gr ient ve tors pro u e y e h 
re ognizer inst n e iter i tr ining though formul te i erently is often use 
in - se spee h re ognition systems 26 

hile it seems simple n s tisfying this tr ining r hite ture h s 11 w 

th t n potent! lly e f t 1 f the re ognizer is simple neur 1 network with 

sigmoi output units the minimum of the loss fun tion is tt ine not when 
the re ognizer Iw ys gives the right nswer ut when it ignores the input n 
sets its output to oust nt ve tor with sm 11 v lues for 11 the omponents 
his is known s the eollapse problem he oil pse only o urs if the re ognizer 
outputs n simult neously t ke their minimum v lue f on the other h n the 
re ognizer’s output 1 yer ont ins units with fixe p r meters then there 
is no su h trivi 1 solution his is ue to the f t th t set of with fixe 

istin t p r meter ve tors nnot simult neously t ke their minimum v lue 
n this se the omplete oil pse es ri e ove oes not o ur owever 
this oes not tot lly prevent the o urren e of mil er oil pse e use the 
loss fun tion still h s fl t spot” for trivi 1 solution with onst nt re ognizer 
output his fl t spot is s le point ut it is ttr five in Imost 11 ire tions 
n is very ifh ult to get out of using gr ient- se minimiz tion pro e ures 
f the p r meters of the s re Howe to pt then the oil pse pro lems 
re ppe rs e use the enters n 11 onverge to single ve tor n the 

un erlying neur 1 network n le rn to pro u e th t ve tor n ignore the input 
i erent kin of oil pse o urs if the wi th of the s re Iso Howe to 
pt he oil pse only o urs if tr in le mo ule su h s neur 1 network 
fee s the s nother pro lem with iter i tr ining is th t the pen Ity of the 
nswer nnot e use reli ly s me sure of onfi en e e use it oes not 
t ke low-pen lty(orhigh-s oring) ompeting nswers into ount simple w y 
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to ress this pro lem n to voi the oil pse is to tr in the whole system 

with is rimin tive loss fun tion s es ri e in the next se tion 



Discriminative Viterbi Training, he i e of is rimin tive iter i tr ining 
is to not only minimize the umul te pen Ity of the lowest pen Ity p th with 
the orre t interpret tion ut Iso to somehow in re se the pen Ity of ompeting 
n possi ly in orre t p ths th t h ve ngerously low pen Ity his type of 
riterion is lie discriminative e use it pi ys the goo nswers g inst the 
ones is rimin tive tr ining pro e ures n e seen s ttempting to nil 

ppropri te sep r ting surf es etween 1 sses r ther th n to mo el in ivi u 1 

1 sses in epen ently of e h other 

ne ex mple of is rimin tive riterion is the i eren e etween the pen Ity 

of the iter i p th in the onstr ine gr ph n the pen Ity of the iter i 

p th in the (un onstr ine ) interpret tion gr ph i e the i eren e etween the 
pen Ity of the est orre t p th n the pen Ity of the est p th ( orre t or 
in orre t) he orrespon ing N tr ining r hite ture is shown in figure 
he left si e of the i gr m is i enti 1 to the N use for non- is rimin tive 
iter i tr ining his loss fun tion re u es the risk of oil pse e use it for es 
the re ognizer to increases the pen Ity of wrongly re ognize o je ts is rimi- 
n tive tr ining n Iso e seen s nother ex mple of error correetion proeedure 
whi h ten s to minimize the i eren e etween the esire output ompute in 
the left h If of the N in figure n the tu 1 output ompute in the right 
h If of figure 

Let the is rimin tive iter i loss fun tion e enote ifdvit n let us 11 
Ccvit the pen Ity of the iter i p th in the onstr ine gr ph n Cvit the 
pen Ity of the iter ip th in the un onstr ine interpret tion gr ph 

L'dvit Ccvit Cvit (4) 

Edvit is Iw ys positive sin e the onstr ine gr ph is su set of the p ths in 
the interpret tion gr ph n the iter i Igorithm sele ts the p th with the 
lowest tot 1 pen Ity n the i e 1 se the two p ths Ccvit n Cvit oin i e n 
Cdvit is zero 

k-prop g ting gr ients through the is rimin tive iter i N s 
some neg tive” tr ining to the previously es ri e non- is rimin tivetr ining 
igure shows how the gr ients re k-prop g te he left h If is i enti 1 
to the non- is rimin tive iter i tr ining N therefore the k-prop g tion 
is i enti 1 he gr ients k-prop g te through the right h If of the N 
re multiplie y -1 sin e Cvit ontri utes to the loss with neg tive sign 
therwise the pro ess is simil r to the left h If he gr ients on r s of Gi„t 
get positive ontri utions from the left h If n neg tive ontri utions from the 
right h If he two ontri utions must e e sin e the pen Ities on Gi„t r s 
re sent to the two h Ives through ” onne tion in the forw r p ss r s in 
Gint th t ppe r neither in Gvit nor in Gcvit h ve gr ient of zero hey o not 
ontri ute to the ost r s th t ppe r in oth Gvit n Gcvit Iso h ve zero 
gr ient he -1 ontri ution from the right h If n els the the +1 ontri ution 
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from the left h If n other wor s when n r is rightfully p rt of the nswer 
there is no gr ient f n r ppe rs in Gcvit ut not in Gvit the gr ient is 
+ 1 he r shoul h ve h lower pen Ity to m ke it to Gvit f n r is in 

Gvit ut not in Gcvit the gr ient is -1 he r h low pen Ity ut shoul 

h ve h higher pen Ity sin e it is not p rt of the esire nswer ri tions of 
this te hni ue h ve een use for the spee h re ognition ri n ourt n ottou 
5 use version of it where the loss fun tion is s tur te to fixe v lue 

n import nt v nt ge of glo In is rimin five tr ining is th t le ru- 

ing fo uses on the most import nt errors n the system le rns to integr te the 
m iguities from the segment tion Igorithm with the m iguities of the h r- 
ter re ognizer here re other tr ining pro e ures th n the ones es ri e 

here some of whi h re es ri e in 19 omplex r ph r nsformer mo ules 

th t om ine interpret tion gr phs with 1 ngu ge mo els n e use to t ke 
linguist! onstr ints into ount 19 

5 Conclusion 

he metho s es ri e in this p per onfirms wh t the history of ttern e og- 
nition h s Ire y shown repe te ly fin ing w ys to in re se the role of le rning 
n st tisti 1 estim tion Imost inv ri ly improves the perform n e of re og- 
nition systems or 2 sh pe re ognition onvolution 1 Neur 1 Networks h ve 
een shown to elimin te the nee for h n - r fte fe ture extr tors epli te 
onvolution 1 Networks h ve een shown to h n le f irly omplex inst n es of 
the fe ture in ing pro lem with ompletely fee -forw r tr ine r hite - 
ture inste of the more tr ition 1 om in tori 1 hypothesis testing metho s 

n situ tion where multiple hypothesis testing is un voi le tr in le r ph 

r nsformer Networks h ve een shown to re u e the nee for h n - r fte 
heuristi s m nu 1 1 eling n m nu 1 p r meter tuning in o ument re ogni- 
tion systems 

lo lly-tr ine r ph r nsformer Networks h ve een pplie su essfully 
to on-line h n writing re ognition n he k re ognition 19 he he k re og- 
nition system se on this on ept is use ommer i lly in sever 1 nks ross 

the S n re s millions of he ks per y he on epts n results in this 

p per help est lish the usefulness n relev n e of gr ient- se minimiz tion 
metho s s gener 1 org nizing prin iple for le rning in 1 rge systems t is le r 
th t r ph r nsformer Networks n e pplie to m ny situ tions where the 
om in knowle ge or the st te inform tion n e represente y gr phs his 
is the se in m ny visu 1 t sks where gr phs n represent Item tive inter- 
pret tions of s ene multiple inst n es of n o je t or rel tionship etween 
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